Upload
nataly-polanskaya
View
5.597
Download
2
Embed Size (px)
DESCRIPTION
The second issue of "The InterBase and Firebird Developer Magazine", it was published in 2005. This issue is devoted to internals of Firebird locking (article by Ann W Harrison), "Testing NO SAVEPOINT feature in InterBase" by Vlad Khorsun, "Object Oriented Programming with Firebird" by Vladimir Kotlyarevsky. Dmitry Kuzmenko, CEO of IBSurgeon, wrote an article "Using IBAnalyst". There is also a nice article regarding replication basics and specific implementation with CopyCat replication software.
Citation preview
Contents
2
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
ContentsEditorrsquos noteRock around the blogby Alexey Kovyazin 3
Firebird conference by Helen E M Borrie 4
Oldest ActiveOn Silly Questions and Idiotic Outcomes by Helen E M Borrie 5
Server internalsCover storyLocking Firebird and the Lock Table by Ann W Harrison 6
Inside BLOBs by Dmitri Kouzmenko and Alexey Kovyazin 11
TestBedTesting NO SAVEPOINT in InterBase 751 by Vlad Horsun Alexey Kovyazin 13
Development areaObject-Oriented Development in RDBMS Part 1 by Vladimir Kotlyarevsky 22
Subscribe now
To receive future issuesnotifications send email to
subscribeibdevelopercom
Replicating and synchronizingInterbaseFireBird databases using CopyCat by Jonathan Neve 27Using IBAnalystby Dmitri Kouzmenko 31Readers feedbackComments to ldquoTemporary tablesrdquo article by Volker Rehn 35
Miscellaneous 36
Best viewed with Acrobat Reader 7Download now
Donations
Alexey KovyazinChief Editor
Helen BorrieEditor
Dmitri KouzmenkoEditor
Noel CosgraveSub-editor
Lev TashchilinDesigner
NatalyaPolyanskayaBlog editor
Credits
Magazine CD
column ldquoOldest Activerdquo by HelenBorrie No need to say more aboutauthor of ldquoThe Firebird Bookrdquo justread it In this issue Helen looks atthe phenomenon of support listsand their value to the Firebird com-munity She takes a swing at someof the unproductive things that listposters do in this topic titled ldquoOnSilly Questions and Idiotic Out-comesrdquo
Letrsquos blog again
Now it is a good time to say a fewwords about our new web-presen-tation Weve started a blog-styleinterface for our magazine ndash take alook on wwwibdevelopercom ifyou havent already discovered itNow you can make comments onany article or message In futurewersquoll publish all the materials relat-ed to the PDF issues along with spe-cial bonus articles and materialsYou can look out for previews ofarticles drafts and behind-the-scene community discussionsPlease feel welcome to blog with us
Donations
The magazine is free for readers
Editorrsquos note
3
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Rock around the block
Rock around the blogEditorrsquos note
By Alexey KovyazinDear readers
I am glad to introduce you thesecond issue of ldquoThe Inter-Base and Firebird DeveloperMagazinerdquo We received a lotof emails with kind words andcongratulations which havehelped us in creating thisissue Thank you very much
In this issue
The cover story is the newestEpisode from Ann W HarrisonldquoLocking Firebird and the LockTablerdquo The article covers the lock-ing issue from general principles tospecific advice so everyone willfind a bit thats interesting or inform-ative
We continue the topic of savepointsinternals with an article ldquoTestingNO SAVEPOINT in InterBase751rdquo Along with interesting prac-tical test results you will find adescription of UNDO log workingsin InterBase 751
We publish the small chapterldquoInside BLOBsrdquo from the forthcom-ing book ldquo1000 InterBaseampFirebird
TipsampTricksrdquo by Dmitri Kouzmenkoand Alexey Kovyazin
Object oriented development hasbeen a high-interest topic for manyyears and it is a still hot topic Thearticle ldquoObject-Oriented Develop-ment in RDBMSrdquo explores the prac-tical use of OOD principles in thedesign of InterBase or Firebirddatabases
Replication is a problem whichfaces every database developersooner or later The article ldquoRepli-cating and synchronizing Inter-BaseFirebird databases usingCopyCatrdquo introduces the approachused in the new CopyCat compo-nent set and the CopyTiger tool
The last article by Dmitri Kouz-menko ldquoUnderstanding Your Data-base with IBAnalystrdquo provides aguide for better understanding ofhow databases work and how tun-ing their parameters can optimizeperformance and avoid bottle-necks
ldquoOldest Activerdquo column
I am very glad to introduce the new
but it is a considerable expense forits publisher We must pay for arti-cles editing and proofreadingdesign hosting etc In future wersquolltry to recover costs solely fromadvertising but for these initialissues we need your support
See details here
httpibdevelopercomdonations
On the Radar
Irsquod like to introduce the severalprojects which will be launched inthe near future We really needyour feedback so please do nothesitate to place your comments inblog or send email to us(readersibdevelopercom)
Magazine CD
We will issue a CD with all 3 issuesof ldquoThe InterBase and FirebirdDeveloper Magazinerdquo in Decem-ber 2005 (yes a Christmas CD)
Along with all issues in PDF andsearchable html-format you will findexclusive articles and bonus materi-als The CD will also include freeversions of popular software relat-
Editorrsquos note
4
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Rock around the blockFirebird conference
ed to InterBase and Firebird andoffer very attractive discounts fordozens of the most popular prod-ucts
The price for CD is USD$999 plusshipping
For details seewwwibdevelopercommagazine-cd
Paper versionof the magazine
In 2006 we intend to issue the firstpaper version of ldquoThe InterBaseand Firebird Developer Maga-zinerdquo The paper version cannot befree because of high productioncosts
We intend to publish 4 issues peryear with a subscription price ofaround USD$49 The issue volumewill be approximately 64 pages
In the paper version we plan toinclude the best articles from theonline issues and of course exclu-sive articles and materials
To be or not be ndash this is the questionthat only you can answer If theidea of subscribing to the paperversion appeals to you pleaseplace a comment in our blog(httpibdevelopercompaper-version) or send email toreadersibdevelopercom withyour pro and cons We look for-
This issue of our magazine almostcoincides with the third annualFirebird World Conference start-ing November 13 in PragueCzech Republic This years con-ference spans three nights with atight programme of back-to-backand parallel presentations overtwo days With speakers in atten-dance from all across Europe andthe Americas presenting topicsthat range from specialised appli-cation development techniques tofirst-hand accounts of Firebirdinternals from the core develop-ers this one promises to be thebest ever
Registration has been well underway for some weeks nowalthough the time is now past forearly bird discounts on registra-tion fees At the time of publish-ing bookings were still beingtaken for accommodation in theconference venue itself HotelOlsanka in downtown PragueThe hotel offers a variety of roomconfigurations including bed andbreakfast if wanted at modesttariffs Registration includes
FirebirdConference
lunches on both conference daysas well as mid-moring and mid-afternoon refreshments
Presentations include one on Fire-birds future development fromthe Firebird Project CoordinatorDmitry Yemanov and anotherfrom Alex Peshkov the architectof the security changes coming inFirebird 2 Jim Starkey will bethere talking about Firebird Vul-can and members of the SASInstitute team will talk aboutaspects of SASs taking on Fire-bird Vulcan as the back-end to itsillustrious statistical softwareMost of the interface develop-ment tools are on the menuincluding Oracle-mode FirebirdPHP Delphi Python and Java(Jaybird)
It is a dont miss for any Firebirddeveloper who can make it toPrague Link to details and pro-gramme either athttpfirebirdsourceforgenetindexphpop=konferenz or at theIBPhoenix websitehttpwwwibphoenixcom
ward for your feedback
Invitation
As a footnote Irsquod like to invite
all people who are in touch withFirebird and InterBase to partici-pate in the life of the communityThe importance of a large activecommunity to any public project --and a DBMS with hundreds of thou-sands of users is certainly public --cannot be emphasised enough It isthe key to survival for such projectsRead our magazine ask your ques-tions on forums and leave com-ments in blog just rock and roll withus
Sincerely
Alexey Kovyazin
Chief Editor
editoribdevelopercom
comingsoon
Oldest Active
5
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
On Silly Questionsand Idiotic Outcomes
Firebird owes its existence to the communitys support lists Though itsall ancient history now if it hadnt been for the esprit de corps among theregulars of the old InterBase support list hosted by merscom in thenineties Borlands shock-horror decision to kill off InterBase develop-ment at the end of 1999 would have been the end of it for the English-speaking users at least
Firebirds support lists on Yahoo and Sourceforge were born out of thesame cataclysm of fatal fire and regeneration that caused the phoenixdrawn from the fire against extreme adversity to take flight as Firebirdin July 2000 Existing and new users both of Firebird and of the richselection of drivers and tools that spun off from and for it depend heav-ily on the lists to get up to speed with the evolution of all of these soft-ware products
Newbies often observe that there is just nothing like our lists for anyother software be it open source or not One of the things that I thinkmakes the Firebird lists stand out from the crowd is that once gripped bythe power of the software our users never leave the lists Instead theystick around learn hard and stay ready and willing to impart to otherswhat they have learnt After nearly six years of this across a dozen listsour concentrated mix of expertise and fidelity is hard to beat
Now doesnt all this make us all feel warm and fuzzy For those of usin the hot core of this voluntary support infrastructure the answer has tobe unfortunately Not all the time There are some bugbears that canmake the most patient person see red Im using this column to drawattention to some of the worst
First and worst is silly questions You have all seen them Perhapsyou even posted them Subject Help Body Every time I try toconnect to Firebird I get the message Connection refused Whatam I doing wrong What follows from that is a frustrating andtedious game for responders What version of Firebird are youusing Which server model Which platform Which toolWhat connection path (By connection path we mean)
Everyones time is valuable Nobody has the luxury of beingable to sit around all day playing this game If we werent will-ing to help we wouldnt be there But you make it hard or even
impossible when you post prob-lems with no descriptions Youwaste our time You waste yourtime We get frustrated becauseyou dont provide the facts youget frustrated because we cantread your mind And the list getsfilled with noise
If theres something Ive learnt inmore than 12 years of on-linesoftware support it is that a prob-lem well described is a problemsolved If a person applies timeand thought to presenting a gooddescription of the problemchances are the solution will hopup and punch him on the noseEven if it doesnt good descrip-tions produce the fastest rightanswers from list contributorsFurthermore those good descrip-tions and their solutions form apowerful resource in the archivesfor others coming along behind
Off-topic questions are anothersource of noise and irritation in thelists At the Firebird website wehave bent over backwards tomake it very clear which lists areappropriate for which areas ofinterest An off-topic posting iseasily forgiven if the poster polite-ly complies with a request from alist regular to move the question tox list When these events insteadbecome flame threads or whenthe same people persistently reof-fend they are an extreme burdenon everyone
In the highly tedious categorywe have the kind of question thatgoes like this Subject Bug inFirebird SQL Body This state-ment works inMSSQLAccessMySQLinsertany non-standard DBMS namehere It doesnt work in FirebirdWhere should I report this bugTo be fair George Bernard Shawwasnt totally right when he saidIgnorance is a sin However
you do at least owe it to yourselfto know what you are talkingabout and not to present yourproblem as an attack on our soft-ware Whether intended or not itcomes over as a troll and youcome across as a fool Theres astrong chance that your attitudewill discourage anyone frombothering with you at all Its no-win sure but its also a reflectionof human nature You reap whatyou sow
In closing this little tirade I justwant to say that I hope Ive strucka chord with those of our reader-ship who struggle to get satisfac-tion from their list postings It willbe a rare question indeed that hasno answer Those rarities well-presented become excellent bugreports If youre not getting ananswer theres a high chance thatyou asked a silly question
Helen E M Borrie
helebortpgcomau
On Silly Questions and Idiotic Outcomes
Cover story
6
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Locking Firebird and the Lock Table
Firebird WorldwideConference 2005
The Firebird conferencewill take place
at the Hotel Olsankain Prague
Czech Republicfrom the evening
of Sunday the 13thof November
(opening session)until the evening
of Tuesday the 15thof November
(closing session)
News amp Events
One of the best-known facts aboutFirebird is that it uses multi-versionconcurrency control instead ofrecord locking For databases thatmanage concurrency thoughrecord locks understanding mecha-nisms of record locking is critical todesigning a high performance highconcurrency application In gener-al record locking is irrelevant toFirebird However Firebird useslocks to maintain internal consisten-cy and for interprocess coordina-tion Understanding how Firebirddoes (and does not) use locks helpswith system tuning and anticipatinghow Firebird will behave underload
Irsquom going to start by describinglocking abstractly then Firebirdlocking more specifically then thecontrols you have over the Firebirdlock table and how they can affectyour database performance So ifyou just want to learn about tuningskip to the end of the article
Concurrency control
Concurrency control in databasesystems has three basic functionspreventing concurrent transactionsfrom overwriting each othersrsquochanges preventing readers fromseeing uncommitted changes andgiving a consistent view of the data-
Locking Firebird and the Lock Tablebase to running transactions Thosethree functions can be implementedin various ways The simplest is toserialize transactions ndash allowingeach transaction exclusive access tothe database until it finishes Thatsolution is neither interesting nordesirable
A common solution is to allow eachtransaction to lock the data it useskeeping other transactions fromreading data it changes and fromchanging data it reads Moderndatabases generally lock recordsFirebird provides concurrency con-trol without record locks by keepingmultiple versions of records eachmarked with the identifier of thetransaction that created it Concur-rent transactions donrsquot overwriteeach otherrsquos changes because thesystem will not allow a transactionto change a record if the mostrecent version was created by aconcurrent transaction Readersnever see uncommitted databecause the system will not returnrecord versions created by concur-rent transactions Readers see aconsistent version of the databasebecause the system returns only thedata committed when they start ndashallowing other transactions to cre-ate newer versions Readers donrsquotblock writers
A lock sounds like a very solidobject but in database systems alock anything but solid ldquoLockingrdquo isa shorthand description of a proto-col for reserving resources Data-bases use locks to maintain consis-tency while allowing concurrentindependent transactions to updatedistinct parts of the database Eachtransaction reserves the resources ndashtables records data pages ndash that itneeds to do its work Typically thereservation is made in memory tocontrol a resource on disk so thecost of reserving the resource is notsignificant compared with the costof reading or changing theresource
Locking example
ldquoResourcerdquo is a very abstract termLets start by talking about lockingtables Firebird does lock tables butnormally it locks them only to pre-vent catastrophes like having onetransaction drop a table that anoth-er transaction is using
Before a Firebird transaction canaccess a table it must get a lock onthe table The lock prevents othertransactions from dropping thetable while it is in use When atransaction gets a lock on a tableFirebird makes an entry in its tableof locks indicating the identity of
the transaction that has the lock theidentity of the table being lockedand a function to call if anothertransaction wants an incompatiblelock on the table The normal tablelock is shared ndash other transactionscan lock the same table in the sameway
Before a transaction can delete atable it must get an exclusive lockon the table An exclusive lock isincompatible with other locks If anytransaction has a lock on the tablethe request for an exclusive lock isdenied the drop table statementfails Returning an immediate erroris one way of dealing with conflict-ing lock requests The other is to putthe conflicting request on a list ofunfulfilled requests and let it wait forthe resource to become available
In its simplest form that is how lock-ing works All transactions followthe formal protocol of requesting alock on a resource before using itFirebird maintains a list of lockedresources a list of requests for lockson resources ndash satisfied or waitingndash and a list of the owners of lockrequests When a transactionrequests a lock that is incompatiblewith existing locks on a resource
Author Ann W Harrisonaharrisonibphoenixcom
Requestfor Sponsors
How to Register
ConferenceTimetable
ConferencePapers
and Speaker
Cover story
7
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Locking Firebird and the Lock Table
Firebird either denies the newrequest or puts it on a list to waituntil the resource is available Inter-nal lock requests specify whetherthey wait or receive an immediateerror on a case-by-case basisWhen a transaction starts it speci-fies whether it will wait for locks thatit acquires on tables etc
Lock modes
For concurrency and read commit-ted transactions Firebird lockstables for shared read or sharedwrite Either mode says ldquoIrsquom usingthis table but you are free to use ittoordquo Consistency mode transac-tions follow different rules Theylock tables for protected read orprotected write Those modes sayldquoIrsquom using the table and no one elseis allowed to change it until Irsquomdonerdquo Protected read is compati-ble with shared read and other pro-tected read transactions Protectedwrite is only compatible with shareread
The important concept about lockmodes is that locks are more subtlethan mutexes ndash locks allowresource sharing as well as protect-ing resources from incompatibleuse
Two-phase locking vs tran-sient locking
The table locks that wersquove been
describing follow a protocol knownas two-phase locking which is typi-cal of locks taken by transactions indatabase systems Databases thatuse record locking for consistencycontrol always use two-phaserecord locks In two-phase lockinga transaction acquires locks as itproceeds and holds the locks until itends Once it releases any lock itcan no longer acquire another Thetwo phases are lock acquisition andlock release They cannot overlap
When a Firebird transaction readsa table it holds a lock on that tableuntil it ends When a concurrencytransaction has acquired a sharedwrite lock to update a table no con-sistency mode transaction will beable to get a protected lock on thattable until the transaction with theshared write lock ends and releasesits locks Table locking in Firebird istwo-phase locking
Locks can also be transient takenand released as necessary duringthe running of a transaction Fire-bird uses transient locking exten-sively to manage physical access tothe database
Firebird page locks
One major difference between Fire-bird and most other databases isFirebirdrsquos Classic mode In Classicmode many separate processesshare write access to a single data-
base file Most databases have asingle server process like Super-Server that has exclusive access tothe database and coordinatesphysical access to the file within
itself Firebird coordinates physicalaccess to the database throughlocks on database pages
In general database theory a trans-
Cover story
8
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Locking Firebird and the Lock Table
action is a set of steps that transformthe database from on consistentstate to another During that trans-formation the resources held by thetransaction must be protected fromincompatible changes by othertransactions Two-phase locks arethat protection
In Firebird internally each time atransaction changes a page itchanges that page ndash and the physi-cal structure of the database as awhole ndash from one consistent state toanother Before a transaction readsor writes a database page it locksthe page When it finishes readingor writing it can release the lockwithout compromising the physicalconsistency of the database fileFirebird page level locking is tran-sient Transactions acquire andrelease page locks throughout theirexistence However to preventdeadlocks transactions must beable to release all the page locks itholds before acquiring a lock on anew page
The Firebird lock table
When all access to a database isdone in a single process ndash as is thecase with most database systems ndashlocks are held in the serverrsquos memo-ry and the lock table is largely invis-ible The server process extends orremaps the lock information asrequired Firebird however man-ages its locks in a shared memory
section In SuperServer only theserver uses that share memoryarea In Classic every databaseconnection maps the shared memo-ry and every connection can readand change the contents of thememory
The lock table is a separate piece ofshare memory In SuperServer thelock table is mapped into the serverprocess In Classic each processmaps the lock table All databaseson a server computer share thesame lock table except those run-ning with the embedded server
The Firebird lock manager
We often talk about the FirebirdLock Manager as if it were a sepa-rate process but it isnrsquot The lockmanagement code is part of theengine just like the optimizer pars-er and expression evaluator Thereis a formal interface to the lockmanagement code which is similarto the formal interface to the distrib-uted lock manager that was part ofVAXVMS and one of the inter-faces to the Distributed Lock Man-ager from IBM
The lock manager is code in theengine In Classic each process hasits own lock manager When aClassic process requests or releasesa lock its lock management codeacquires a mutex on the sharedmemory section and changes the
state of the lock table to reflect itsrequest
Conflicting lock requests
When a request is made for a lock
on a resource that is already lockedin an incompatible mode one oftwo things happens Either therequesting transaction gets animmediate error or the request is
Registering for the Conference
Call for papers
Sponsoring the Firebird Conference
httpfirebird-conferencecom
Cover story
9
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Locking Firebird and the Lock Table
put on a list of waiting requests andthe transactions that hold conflictinglocks on the resource are notified ofthe conflicting request Part of everylock request is the address of a rou-tine to call when the lock interfereswith another request for a lock onthe same object Depending on theresource the routine may cause thelock to be released or require thenew request to wait
Transient locks like the locks ondatabase pages are releasedimmediately When a transactionrequests a page lock and that pageis already locked in an incompati-ble mode the transaction or trans-actions that hold the lock are noti-fied and must complete what theyare doing and release their locksimmediately Two-phase locks liketable locks are held until the trans-action that owns the lock completesWhen the conflicting lock isreleased and the new lock is grant-ed then transaction that had beenwaiting can proceed
Locks as interprocess com-munication
Lock management requires a highspeed completely reliable commu-nication mechanism between trans-actions including transactions indifferent processes The actualmechanism varies from platform toplatform but for the database to
work the mechanism must be fastand reliable A fast reliable inter-process communication mechanismcan be ndash and is ndash useful for a num-ber of purposes outside the areathatrsquos normally considered data-base locking
For example Firebird uses the locktable to notify running transactionsof the existence of a new index on atable Thatrsquos important since assoon as an index becomes activeevery transaction must help main-tain it ndash making new entries when itstores or modifies data removingentries when it modifies or deletesdata
When a transaction first referencesa table it gets a lock on the exis-tence of indexes for the tableWhen another transaction wants tocreate a new index on that table itmust get an exclusive lock on theexistence of indexes for the tableIts request conflicts with existinglocks and the owners of those locksare notified of the conflict Whenthose transactions are in a statewhere they can accept a new indexthey release their locks and imme-diately request new shared locks onthe existence of indexes for thetable The transaction that wants tocreate the index gets its exclusivelock creates the index and com-mits releasing its exclusive lock onthe existence of indexing As othertransactions get their new locks
they check the index definitions forthe table find the new index defini-tion and begin maintaining theindex
Firebird locking summary
Although Firebird does not lockrecords it uses locks extensively toisolate the effects of concurrenttransactions Locking and the locktable are more visible in Firebirdthan in other databases because thelock table is a central communica-tion channel between the separateprocesses that access the databasein Classic mode In addition to con-trolling access to database objectslike tables and data pages the Fire-bird lock manager allows differenttransactions and processes to notifyeach other of changes to the state ofthe database new indexes etc
Lock table specifics
The Firebird lock table is an in-mem-ory data area that contains of fourprimary types of blocks The lockheader block describes the locktable as a whole and containspointers to lists of other blocks andfree blocks Owner blocks describethe owners of lock requests ndash gen-erally lock owners are transactionsconnections or the SuperServerRequest blocks describe the rela-tionship between an owner and alockable resource ndash whether therequest is granted or pending the
mode of the request etc Lockblocks describe the resources beinglocked
To request a lock the owner findsthe lock block follows the linked listof requests for that lock and addsits request at the end If other own-ers must be notified of a conflictingrequest they are located throughthe request blocks already in the listEach owner block also has a list ofits own requests The performancecritical part of locking is finding lockblocks For that purpose the locktable includes a hash table foraccess to lock blocks based on thename of the resource being locked
A quick refresher on hashing
A hash table is an array with linkedlists of duplicates and collisionshanging from it The names of lock-able objects are transformed by afunction called the hash functioninto the offset of one of the elementsof the array When two namestransform to the same offset theresult is a collision When two lockshave the same name they areduplicates and always collide
In the Firebird lock table the arrayof the hash table contains theaddress of a hash block Hashblocks contain the original name acollision pointer a duplicate point-er and the address of the lock blockthat corresponds to the name Thecollision pointer contains the
Cover story
10
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Locking Firebird and the Lock Table
address of a hash block whosename hashed to the same valueThe duplicate pointer contains theaddress of a hash block that hasexactly the same name
A hash table is fast when there arerelatively few collisions With nocollisions finding a lock blockinvolved hashing the name index-ing into the array and reading thepointer from the first hash blockEach collision adds another pointerto follow and name to check Theratio of the size of the array to thenumber of locks determines thenumber of collisions Unfortunatelythe width of the array cannot beadjusted dynamically because thesize of the array is part of the hashfunction Changing the widthchanges the result of the functionand invalidates all existing entries inthe hash table
Adjusting the lock table toimprove performance
The size of the hash table is set inthe Firebird configuration file Youmust shut down all activity on alldatabases that share the hash tablendash normally all databases on themachine ndash before changes takeeffect The Classic architecture usesthe lock table more heavily thanSuperServer If you choose theClassic architecture you shouldcheck the load on the hash tableperiodically and increase the num-
ber of hash slots if the load is highThe symptom of an overloadedhash table is sluggish performanceunder load
The tool for checking the lock tableis fb_lock_print which is a com-mand line utility in the bin directoryof the Firebird installation tree Thefull lock print describes the entirestate of the lock table and is of limit-ed interest When your system isunder load and behaving badlyinvoke the utility with no options orswitches directing the output to afile Open the file with an editorYoull see output that starts some-thing like this
LOCK_HEADER BLOCK Version114 Active owner0 Length262144 Used85740 Semmask0x0 Flags 0x0001 Enqs 18512 Converts 490 Rejects0 Blocks 0 Deadlock scans0 Deadlocks0 Scan interval10 Acquires 21048 Acquire blocks0 Spin count0 Mutex wait103 Hash slots101 Hash lengths (minavgmax)3 15 30
hellipThe seventh and eighth lines suggestthat the hash table is too small andthat it is affecting system perform-ance In the example these valuesindicate a problem
Mutex wait 103 Hash slots 101 Hash lengths (minavgmax)3 15 30 In the Classic architecture eachprocess makes its own changes tothe lock table Only one process is
allowed to update the lock table atany instant When updating the locktable a process holds the tablersquosmutex A non-zero mutex wait indi-cates that processes are blocked bythe mutex and forced to wait foraccess to the lock table In turn thatindicates a performance probleminside the lock table typicallybecause looking up a lock is slow
If the hash lengths are more thanmin 5 avg 10 or max 30 youneed to increase the number ofhash slots The hash function used inFirebird is quick but not terribly effi-cient It works best if the number ofhash slots is prime
Change this line in the configurationfileLockHashSlots = 101
Uncomment the line by removing
the leading Choose a value thatis a prime number less than 2048
LockHashSlots = 499
The change will not take effect untilall connections to all databases onthe server machine shut down
If you increase the number of hashslots you should also increase thelock table size The second line ofthe lock print
Version114 Active owner 0Length 262144 Used 85740
tells you how close you are to run-ning out of space in the lock tableThe Version and Active owner areuninteresting The length is the max-imum size of the lock table Used isthe amount of space currently allo-cated for the various block typesand hash table If the amount usedis anywhere near the total lengthuncomment this parameter in theconfiguration file by removing theleading and increase the value
LockMemSize = 262144
to this
LockMemSize = 1048576
The value is bytes The default locktable is about a quarter of amegabyte which is insignificant onmodern computers Changing thelock table size will not take effectuntil all connections to all databaseson the server machine shut down
PHP Server
One of the more interest-ing recent developmentsin information technolo-gy has been the rise ofbrowser based applica-tions often referred toby the acronym LAMP
One key hurdle forbroad use of the LAMPtechnology for mid-mar-ket solutions was that itwas never easy to con-figure and manage
PHPServer changes thatit can be installed withjust four clicks of themouse and support forFirebird is compiled in
PHPServer shares all thequalities of Firebird it isa capable compacteasy to install and easyto manage solution
PHPServer is a freedownload
Read more at
wwwfyracleorgphpserverhtml
News amp Events
Server internals
11
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Inside BLOBs
How the serverworks with BLOBs
The BLOB data type is intended forstoring data of variable size Fieldsof BLOB type allow for storage ofdata that cannot be placed in fieldsof other types - for example pic-tures audio files video fragmentsetc
From the point view of the databaseapplication developer using BLOBfields is as transparent as it is forother field types (see chapter ldquoDatatypesrdquo for details) However thereis a significant difference betweenthe internal implementation mecha-nism for BLOBs and that for otherdata
Unlike the mechanism used for han-dling other types of fields the data-base engine uses a special mecha-nism to work with BLOB fields Thismechanism is transparently inte-grated with other record handlingat the application level and at thesame time has its own means ofpage organization Lets consider indetail how the BLOB-handlingmechanism works
Inside BLOBsThis is an excerpt from the book ldquo1000 InterBase amp Firebird Tips amp Tricksrdquoby Alexey Kovyazin and Dmitri Kouzmenko which will be published in 2006
Initially the basic record data onthe data page includes a referenceto a ldquoBLOB recordrdquo for each non-null BLOB field ie to record-likestructure or quasi-record that actu-
ally contains the BLOB dataDepending on the size of the BLOBthis BLOB-record will be one ofthree types
The first type is the simplest If thesize of BLOB-field data is less thanthe free space on the data page it isplaced on the data page as a sepa-rate record of BLOB type
Author Dmitri Kouzmenkokdvib-aidcom
Author Alexey Kovyazinakib-aidcom
Server internals
12
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Inside BLOBs
The second type is used when thesize of BLOB is greater than thefree space on the page In thiscase references to pages contain-ing the actual BLOB data arestored in a quasi-record Thus atwo-level structure of BLOB-fielddata is used
If the size of BLOB-field contents isvery large a three-level structureis used ndash a quasi-record stores ref-erences to BLOB pointer pageswhich contain references to theactual BLOB data
The whole structure of BLOB stor-age (except for the quasi-recordof course) is implemented by onepage type ndash the BLOB page typeDifferent types of BLOB-pages dif-fer from each other in the presenceof a flag (value 0 or 1) defininghow the server should interpret thegiven page
BLOB Page
The blob page consists of the fol-lowing parts
The special header contains thefollowing information
bullThe number of the first blob pagein this blob It is used to check thatpages belong to one blob
bullA sequence number This isimportant in checking the integrityof a BLOB For a BLOB pointerpage it is equal to zero
bullThe length of data on a page Asa page may or may not be filled tothe full extent the length of actualdata is indicated in the header
Maximum BLOB size
As the internal structure for storingBLOB data can have only 3 levelsof organization and the size ofdata page is also limited it is pos-sible to calculate the maximum sizeof a BLOB
However this is a theoretical limit(if you want you can calculate it)but in practice the limit will bemuch lower The reason for thislower limit is that the length ofBLOB-field data is determined by avariable of ULONG type ie itsmaximal size will be equal to 4gigabytes
Moreover in reality this practicallimit is reduced if a UDF is to beused for BLOB processing Aninternal UDF implementationassumes that the maximum BLOB
size will be 2 gigabytes So if youplan to have very large BLOBfields in your database you shouldexperiment with storing data of alarge size beforehand
The segment size mystery
Developers of database applica-tions often ask what the SegmentSize parameter in the definition ofa BLOB is why we need it andwhether or not we should set itwhen creating Blob-fields
In reality there is no need to setthis parameter Actually it is a bitof a relic used by the GPRE utilitywhen pre-processing EmbeddedSQL When working with BLOBsGPRE declares a buffer of speci-fied size based on the segmentsize Setting the segment size hasno influence over the allocationand the size of segments whenstoring the BLOB on disk It alsohas no influence on performanceTherefore the segment size can besafely set to any value but it is setto 80 bytes by default
Information for those who want toknow everything the number 80was chosen because 80 symbolscould be allocated in alphanumer-ic terminals
TestBed
13
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
In issue 1 we published DmitriYemanovrsquos article about the inter-nals of savepoints While that articlewas still on the desk Borlandannounced the release of InterBase751 introducing amongst otherthings a NO SAVEPOINT optionfor transaction management Is thisan important improvement for Inter-Base We decided to give thisimplementation a close look and testit some to discover what it is allabout
Testing NO SAVEPOINT
In order to analyze the problem thatthe new transaction option wasintended to address and to assessits real value we performed severalvery simple SQL tests The tests areall 100 reproducible so you willbe able to verify our results easily
Database for testing
The test database file was created inInterBase 751 page size = 4096character encoding is NONE Itcontains two tables one stored pro-cedure and three generators
For the test we will use only one
Testing the NO SAVEPOINTfeature in InterBase 751 Author Alexey Kovyazin
akib-aidcom
Author Vlad Horsunhvladuserssourceforgenet
table with the following structure
CREATE TABLE TEST (ID NUMERIC(182)NAME VARCHAR(120)DESCRIPTION VARCHAR(250)CNT INTEGERQRT DOUBLE PRECISIONTS_CHANGE TIMESTAMPTS_CREATE TIMESTAMPNOTES BLOB
)This table contains 100000 records which will be updated during the testThe stored procedure and generators are used to fill the table with test dataYou can increase the quantity of records in the test table by calling thestored procedure to insert them
SELECT FROM INSERTRECS(1000)
The second table TEST2DROP has the same structure as the first and is filledwith the same records as TESTINSERT INTO TEST2DROP SELECT FROM TESTAs you will see the second table will be dropped immediately after connectWe are just using it as a way to increase database size cheaply the pagesoccupied by the TEST2DROP table will be released for reuse after we dropthe table With this trick we avoid the impact of database file growth on thetest results
Setting test environment
All that is needed to perform this test is the trial installation package of Inter-Base 751 the test database and an SQL script
Download the InterBase 751 trialversion from wwwborlandcomThe installation process is obviousand well-described in the InterBasedocumentation
You can download a backup of thetest database ready to use fromhttpwwwibdevelopercomissue2testspbackupzip (~4 Mb) oralternatively an SQL script for cre-ating it fromhttpwwwibdevelopercomissue2testspdatabasescriptzip (~1Kb)
If you download the databasebackup the test tables are alreadypopulated with records and youcan proceed straight to the sectionldquoPreparing to testrdquo below
If you choose instead to use theSQL script you will create data-base yourself Make sure youinsert 100000 records into tableTEST using the INSERTRECS storedprocedure and then copy all ofthem to TEST2DROP three or fourtimes
After that perform a backup of thisdatabase and you will be on thesame position as if you had down-
TestBed
14
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
loaded ldquoready for userdquo backup
Hardware is not a material issue for these tests since we are only comparing performance with and without the NOSAVEPOINT option Our test platform was a modest computer with Pentium-4 2 GHz with 512 RAM and an 80GBSamsung HDD
Preparing to test
A separate copy of the test database is used for each test case in order to eliminate any interference between state-ments We create four fresh copies of the database for this purpose Supposing all files are in a directory calledCTEST simply create the four test databases from your test backup file
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp1ib
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp2ib
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp3ib
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp4ib
SQL test scripts
The first script tests a regular one-pass update without the NO SAVEPOINT option
For convenience the important commands are clarified with comments
connect CTESTtestsp1ib USER SYSDBA Password masterkey Connect drop table TEST2DROP Drop table TEST2DROP to free database pagescommitselect count() from test Walk down all records in TEST to place them into cachecommitset time on enable time statistics for performed statementsset stat on enables writesfetchesmemory statisticscommit perform bulk update of all records in TEST table update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommit
quit The second script tests performance for the same UPDATE with the NO SAVEPOINT option
TestBed
15
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
connect Ctestsp2ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitSET TRANSACTION NO SAVEPOINT enable NO SAVEPOINTupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit
Except for the inclusion of the SET TRANSACTION NO SAVEPOINT statement in the second script both scripts arethe same simply testing the behavior of engine in case of the single bulk UPDATE
To test sequential UPDATEs we added several UPDATE statements--we recommend using five The script for testing without NO SAVEPOINT would be
connect Etestsp3ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit
Fyracle 089
Janus has released anew version of Oracle-mode Firebird FyracleFyracle is a specialized
build of Firebird 15it adds temporary
tables hierarchicalqueries and
a PLSQL engine
Version 089 addssupport for storedprocedures written
in Java
Fyracle dramaticallyreduces the cost of port-ing Oracle-based appli-
cations to Firebird
Common usageincludes the
Compiere open sourceERP package
mid-market deploymentsof Developer
2000 applicationsand demo CDsof applications
without license trouble
Read more at
wwwjanus-soft-warecom
News amp Events
IBAnalyst 19
IBSurgeon has issued newversion of IBAnalyst
Now it can better analyzeInterBase or Firebird data-base statistics with usingmetadata information (in thiscase connection to databaseis required)
IBAnalyst is a tool that assistsa user to analyze in detailFirebird or InterBase data-base statistics and identifypossible problems with data-base performance mainte-nance and how an applica-tion interacts with the data-base
It graphically displays data-base statistics and can thenautomatically make intelli-gent suggestions aboutimproving database per-formance and databasemaintenance
wwwibsurgeoncomnewshtml
News amp Events
You can download all the scripts and the raw results of their execution from this locationhttpwwwibdevelopercomissue2testresultszip
TestBed
16
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
How to perform the test
The easiest way to perform the test is to use isqls INPUT command
Suppose you have the scripts located in ctestscripts
gtisqlUse CONNECT or CREATE DATABASE to specify a databaseSQLgtinput ctestscriptsscript01sql
Test resultsThe single bulk UPDATE
First letrsquos perform the test where the single-pass bulk UPDATE is performed
This is an excerpt from the one-pass script with default transaction settings
SQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10509940Delta memory = 214016Max memory = 10509940Elapsed time= 463 secBuffers = 2048Reads = 2111Writes 375Fetches = 974980
SQLgt commitCurrent memory = 10340752Delta memory = -169188Max memory = 10509940Elapsed time= 003 secBuffers = 2048Reads = 1Writes 942Fetches = 3
Fast Report 319
The new version of thefamous Fast Report isnow out FastReportreg 3is an add-in componentthat gives your applica-tions the ability to gener-ate reports quickly andefficiently FastReportregprovides all the tools youneed to develop reportsAll variants of FastRe-portreg 3 contains
Visual report designerwith rulers guides andzooming wizard forbasic types of reportexport filters for html tiffbmp jpg xls pdf out-puts Dot matrix reportssupport support for mostpopular DB-engines FullWYSIWYG text rotation0360 degrees memoobject supports simplehtml-tags (font color b iu sub sup) improvedstretching (StretchModeShiftMode properties)access to DB fieldsstyles text flow URLsAnchors
TrialBuy Now
News amp Events
TECT Softwarepresents
New versions of nice utilitiesfrom TECT Softwarewwwtectsoftnet
SPGen Stored Procedure Gen-erator for Firebird and Inter-
Base creates a standard set ofstored procs for any table full
details can be found here
httpwwwtectsoftnetProductsFirebird
FIBSPGenaspx
And FBMail Firebird EMail(FBMail) is a cross platform PHP
utility which easily allows thesending of emails direct from
within a database
This utility is specifically aimedat ISPs that support Firebirdbut can easily be used on anycomputer which is connected
to the internet
Full detailscan be found here
httpwwwtectsoftnetProductsFirebird
FBMailaspx
News amp Events
This is an excerpt from the one-pass script with NO SAVEPOINT enabled
CopyTiger 100
CopyTigera new productfrom Microtecis an advanced
databasereplication amp
synchronisationtool for
InterBaseFirebird
powered bytheir CopyCatcomponent set(see the articleabout CopyCat
in this issue)
For moreinformation about
CopyTiger seewwwmicrotecfr
copycatct
Downloadan evaluation
version
TestBed
17
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
SQLgt SET TRANSACTION NO SAVEPOINTSQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10352244Delta memory = 55296Max memory = 10352244Elapsed time= 472 secBuffers = 2048Reads = 2102Writes 350Fetches = 967154
SQLgt commitCurrent memory = 10344052Delta memory = -8192Max memory = 10352244Elapsed time= 015 secBuffers = 2048Reads = 1Writes 938Fetches = 3
Performance appears to be almost the same whether the NO SAVEPOINT option is enabled or not
Sequential bulk UPDATE statements
With the mutlti-pass script (sequential UPDATEs) the raw test results are rather large
Table 1
Test results for 5 sequentalUPDATEs
News amp Events
SQLHammer 15Enterprise Edition
The interestingdeveloper toolSQLHammer
now hasan Enterprise Edition
SQLHammer isa high-performance tool
for rapiddatabase development
and administration
It includes a commonintegrated development
environmentregistered custom
componentsbased on the
Borland Package Librarymechanism
and an Open Tools APIwritten in
DelphiObject Pascal
wwwmetadataforgecom
News amp Events
TestBed
18
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
For convenience the results are tabulated above
The first UPDATE statement has almost the same execution time with and without the NOSAVEPOINT option However memory consumption is reduced fivefold when we use NOSAVEPOINT
Figure 1
Time to perform UPDATEs with and without NO SAVEPOINT
Table 1
Test results for 5 sequental UPDATEs
In the second UPDATE we start to see the difference With default transaction settings thisUPDATE takes a very long time - 47 seconds - compared to only 7210 ms with NO SAVE-POINT enabled With default transaction settings we can see that memory usage is signifi-cant wherease with NO SAVEPOINT no additional memory is used
The third and all following UPDATE statements with default settings show equal time andmemory usage values and the growth of writes parameters
Figure 2
Memory usage while performing UPDATEs with and without NO SAVEPOINT
With NO SAVEPOINT usage we observe that timememory values and writes growth areall small and virtually equal for each pass The corresponding graphs are below
Inside the UNDO logSo what happened during the execution of the test scripts What is the secret behind thismagic that NO SAVEPOINT does Is there any cost attached to these gains
A few words about versions
You probably know already that InterBase is a multi-record-version database enginemeaning that each time a record is changed a new version of that record is produced The
TestBed
19
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
old version does not disappearimmediately but is retained as abackversion
In fact the first time a backversion iswritten to disk it is as a delta ver-sion which saves disk space andmemory usage by writing out onlythe differences between the old andthe new versions The engine canrebuild the full old version from thenew version and chains of delta ver-sions It is only if the same record isupdated more than once within thesame transaction that the full back-version is written to disk
The UNDO log concept
You may recall from Dmitrirsquos articlehow each transaction is implicitlyenclosed in a frame of savepointseach having its own undo log Thislog stores a record of each changein sequence ready for the possibili-ty that a rollback will be requested
A backversion materializes when-ever an UPDATE or DELETE state-ment is performed The engine hasto maintain all these backversions inthe undo log for the relevant save-point
So the Undo Log is a mechanism tomanage backversions for save-points in order to enable the associ-ated changes to be rolled back Theprocess of Undo logging is quitecomplex and maintaining it canconsume a lot of resources
The NO SAVEPOINToption
The NO SAVEPOINT option inInterBase 751 is a workaround forthe problem of performance lossduring bulk updates that do multiplepasses of a table The theory is ifusing the implicit savepoint man-agement causes problems then letskill the savepoints No savepoints ndashno problem -)
Besides ISQL it has been surfacedas a transaction parameter in bothDSQL and ESQL At the API level anew transaction parameter block(TPB) option isc_tpb_no_savepointcan be passed in the isc_start_trans-action() function call to disablesavepoints management Syntaxdetails for the latter flavors and forthe new tpb option can be found inthe 751 release notes
The effect of specifying the NOSAVEPOINT transaction parameteris that no undo log will be createdHowever along with the perform-ance gain for sequential bulkupdates it brings some costs fortransaction management
First and most obvious is that withNO SAVEPOINT enabled anyerror handling that relies on save-points is unavailable Any errorduring a NO SAVEPOINT transac-tion precludes all subsequent exe-cution and leads to rollback (see
ldquoRelease Notesrdquo for InterBase 75 SP1 ldquoNew in InterBase 751rdquo page 2-2)
Secondly when a NO SAVEPOINT transaction is rolled back it is marked as rolled back in the transactioninventory page Record version garbage thereby gets stuck in the interesting category and prevents theOIT from advancing Sweep is needed to advance the OIT and back out dead record versions
Fuller details of the NO SAVEPOINT option are provided in the InterBase 751 Release Notes
Initial situation
Consider the implementation details of the undo-log Figure 3 shows the initial situation
Recall that we perform this test on freshly-restored database so it is guaranteed that only one version existsfor any record
Figure 3
Initial situation before any UPDATE - only the one record version exists Undo log is empty
The first UPDATE
The first UPDATE statement creates delta backversions on disk (see figure 4) Since deltas store only thedifferences between the old and new versions they are quite small This operation is fast and it is easywork for the memory manager
It is simple to visualize the undo log when we perform the first UPDATEDELETE statement inside the trans-action ndash the engine just records the numbers of all affected records into the bitmap structure If it needs toroll back the changes associated with this savepoint it can read the stored numbers of the affected recordsthen walk down to the version of each and restore it from the updated version and the backversion storedon disk
TestBed
20
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
Figure 4
UPDATE1 create small delta version on disk and put record number into UNDO log
This approach is very fast and economical on memory usage The enginedoes not waste too many resources to handle this undo log ndash in fact it reusesthe existing multi-versioning mechanism Resource consumption is merelythe memory used to store the bitmap structure with the backversion num-bers We dont see any significant difference here between a transactionwith the default settings and one with the NO SAVEPOINT option enabled
The second UPDATE
When the second UPDATE statement is performed on the same set of records we have adifferent situation
Here is a good place to note that the example we are considering is the most simple situa-tion where only the one global (transaction level) savepoint exists We will also look at thedifference in the Undo log when an explicit (or enclosed BEGINhellip END) savepoint is used
To preserve on-disk integrity (remember the lsquocareful writersquo principle ) the engine mustcompute a new delta between the old version (by transaction1) and new version (by trans-action2 update2) store it somewhere on disk fetch the current full version (by transac-tion2 update1) put it into the in-memory undo-log replace it with the new full version (withbackpointers set to the newly created delta) and erase the old now superseded delta Asyou can see ndash there is much more work to do both on disk and in memory
The engine could write all intermediate versions to disk but there is no reason to do soThese versions are visible only to the modifying transaction and would not be used unlessa rollback was required
Figure 5
The second UPDATE creates a new delta backversion for transaction 1erases from disk the delta version created by the first UPDATE and copiesthe version from UPDATE1 into the Undo log
This all makes hard work for the memory manager and the CPU as you can see from thegrowth of the ldquomax memrdquordquodelta memrdquo parameters values in the test that uses the defaulttransaction parameters
When NO SAVEPOINT is enabled we avoid the expense of maintaining the Undo log Asa result we see execution time readswrites and memory usage as low for subsequentupdates as for the first
The third UPDATE
The third and all subsequent UPDATEs are similar to the second UPDATE with one excep-tion ndash memory usage does not grow any further
TestBed
21
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
Figure 6
The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2version and its own version is written to disk as the latest one
Figure 7
If we have an explicit SAVEPOINT each new record version associated with it will have acorresponding backversion in the Undo log of that savepoint
Original design of IB implement second UPDATE another way but sometime after IB6 Bor-land changed original behavior and we see what we see nowBut this theme is for anotherarticle)
Why is the delta of memory usage zero The reason is that beyond the second UPDATEno record version is created From here on the update just replaces record data on diskwith the newest one and shifts the superseded version into the Undo log
A more interesting question is why we see an increase in disk reads and writes during thetest We would have expected that the third and following UPDATEs would do essentiallyequal numbers of read and writes to write the newest versions and move the previous onesto the undo log However we are actually seeing a growing count of writes We have noanswer for it but we would be pleased to know
The following figure (figure 6) helps to illustrate the situation in the Undo log during thesequential updates When NO SAVEPOINT is enabled the only pieces we need to performare replacing the version on disk and updating the original backversion It is fast as the firstUPDATE
Explicit SAVEPOINTWhen an UPDATE statement is going to be performed within its own explicit or implicit
BEGIN END savepoint framework the engine has to store a backversion for each associ-ated record version in the Undo log
For example if we used an explicit savepoint eg SAVEPOINT Savepoint1 upon perform-ing UPDATE2 we would have the situation illustrated in figure 7
In this case the memory consumption would be expected to increase each time an UPDATEoccurs within the explicit savepoints scope
SummaryThe new transaction option NO SAVEPOINT can solve the problem of excessive resourceusage growth that can occur with sequential bulk updates It should be beneficial whenapplied appropriately Because the option can create its own problems by inhibiting theadvance of the OIT it should be used with caution of course The developer will need to takeextra care about database housekeeping particularly with respect to timely sweeping
Development area
22
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article
The sources in the bibliography arelisted in the order of their appear-ance in my mind
The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability
of all the methods taken from thearticles and of course I have testedall my own methods
Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise
I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc
And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle
The problem state-ment
What is the problem
Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common
1Everyone got used to them andfelt comfortable with them
2They are quite fast (if you usethem according to certain knownstandards)
3They use SQL which is an easycomprehensible and time-proveddata manipulation method
4They are based upon a strongmathematical theory
5They are convenient for applica-
tion development - if you developyour applications just like 20-30years ago
As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and
Author Vladimir Kotlyarevskyvladcontekru
Development area
23
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip
As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored
What do we need
The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing
RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task
The OID
All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it
First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have
any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database
OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex
Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)
Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world
And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones
Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles
The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers
ClassId
All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is
Development area
24
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain
Name Description cre-ation_date change_dateowner
Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect
links is often a very complicatedtask to put it mildly
Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later
The OBJECTS Table
Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure
So let us design this table
Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)
The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)
Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method
Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage
There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student
create domain TOID as integer not nullcreate table OBJECTS(
OID TOID primary keyClassId TOIDName varchar(32)
A simple plain dictionary can be retrieved from the OBJECTStable by the following query
select OID Name Description from OBJECTS where ClassId = LookupId
Development area
25
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description
from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)
Later I will demonstrate how to add alittle more intelligence to such simpledictionaries
Storing of more complexobjects
It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping
Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders
create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))
which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to
ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query
select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id
As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did
If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M
create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))
One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present
Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database
Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this
create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))
connected 1M with OBJECTS by OID There is also a table
create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))
which describes type metadata ndash an attribute set (their names and types) for eachobject type
All attributes for particular object where the OID is known are retrieved by the query
select attribute_id value from attributeswhere OID = oid
or with names of attributes
select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id
Development area
26
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries
select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id
Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query
Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the
database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base
The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1
Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage
Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB
It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-
utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation
To be continuedhellip
FastReport 3 - new generation of the reporting tools
Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix
reports support support most popular DB-engines
Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags
(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)
Access to DB fields styles text flow URLs Anchors
httpwwwfast-reportcom
Development area
27
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
Replicating and synchronizingInterBaseFirebirddatabases using CopyCat
AuthorJonathan Neve
jonathanmicrotecfr
wwwmicrotecfr
PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication
Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases
Data logging
Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)
Multi-node replication
Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after
In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it
replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)
When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it
Two-way replication
One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed
The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change
Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself
Primary key synchronization
One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-
Development area
28
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values
Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server
When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique
Conflict management
Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-
sion of the record this is a conflict
CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved
This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently
Difficult database structures
There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node
Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged
To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an
Development area
29
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable
PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms
1 As a set of Delphi C++ Builder components
2 As a standalone Win32 replication tool
We will now present each one of these products
1 CopyCat Delphi C++ Builder Component Set
CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions
Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server
Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table
Below is a concise guide for getting started with the CopyCat component suite
Download and install the evaluation components fromhttpwwwmicrotecfrcopycat
Prepare databases
1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components
2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)
3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo
4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base
5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo
Development area
30
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data
6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority
7) Apply all the generated SQL to all databases that should be replicated
8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)
Replicate
1 In Delphi open the ldquoReplicatorrdquo example project
2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property
3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases
4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)
5 Compile and run the example
6 Press the ldquoReplicate nowrdquo button
2 CopyTiger the CopyCat Win32standalone replication tool
For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER
Features include
bull Easy to use installer
bull Independent server Administrator tool (CTAdmin)
bull Configuration wizard for setting up links to master slave databases
bull Robust replication engine based on Microtec CopyCat
bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections
bull Simple amp intuitive control panel
bullAutomatic email notification on certain events (conflicts PK violations etc)
Visit the CopyTiger homepage to download a time-limited trial version
httpwwwmicrotecfrcopycatct
SUMMARY
In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit
1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications
2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure
CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool
You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr
As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold
Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code
ibd-852-150
Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems
Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases
Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the
Development area
31
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance
Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets
Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained
database statistics from time to time
Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and
indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports
The summary shown in Figure 1provides general information about
Figure 1 Summary of database statistics
Author Dmitri Kouzmenkokdvib-aidcom
1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction
Development area
32
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases
Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)
As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed
What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay
Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be
stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF
Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure
Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1
Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint
We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base
bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment
bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions
bull The Oldest active3 ndash the oldestcurrently active transaction
bull The Next transaction ndash thetransaction number that will beassigned to new transaction
bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell
if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources
bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset
As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem
It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless
Do not gather statistics if you
bull Just restored your database
bull Performed backup (gbak ndashb dbgdb) without the ndashg switch
bull Recently performed a manual sweep (gfix ndashsweep)
Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)
Development area
33
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw
The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data
For example look at ADDR_ADDRESS_IDX6 First of all the index
How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)
The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most
multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments
Table informationLets take look at another sampleoutput from IBAnalyst
The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics
ly and what the table size is inmegabytes Most of these warningsare customizable
In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records
Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics
were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could
bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway
Development area
34
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20
bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column
That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics
Reports
There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded
As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size
Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-
Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144
ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3
SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance
Readers feedback
35
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Comments to ldquoTemporary tablesrdquo
One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence
Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird
This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience
Then there are situations where theoptimizer simply loses the plot
because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans
Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem
The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp
and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample
The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)
We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is
dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably
Well to be fair Firebird developersare working on both topics
Volker Rehn
volkerrehnbigpondcom
We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it
Alexey Kovyazin Chief Editor
Readers feedback
Miscellaneous
36
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Invitations SubscriptionDonations Sponsorships
This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000
Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance
Subscribe now
To receive future issuesnotifications send email to
subscribeibdevelopercom
MagazineCD
Donations
column ldquoOldest Activerdquo by HelenBorrie No need to say more aboutauthor of ldquoThe Firebird Bookrdquo justread it In this issue Helen looks atthe phenomenon of support listsand their value to the Firebird com-munity She takes a swing at someof the unproductive things that listposters do in this topic titled ldquoOnSilly Questions and Idiotic Out-comesrdquo
Letrsquos blog again
Now it is a good time to say a fewwords about our new web-presen-tation Weve started a blog-styleinterface for our magazine ndash take alook on wwwibdevelopercom ifyou havent already discovered itNow you can make comments onany article or message In futurewersquoll publish all the materials relat-ed to the PDF issues along with spe-cial bonus articles and materialsYou can look out for previews ofarticles drafts and behind-the-scene community discussionsPlease feel welcome to blog with us
Donations
The magazine is free for readers
Editorrsquos note
3
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Rock around the block
Rock around the blogEditorrsquos note
By Alexey KovyazinDear readers
I am glad to introduce you thesecond issue of ldquoThe Inter-Base and Firebird DeveloperMagazinerdquo We received a lotof emails with kind words andcongratulations which havehelped us in creating thisissue Thank you very much
In this issue
The cover story is the newestEpisode from Ann W HarrisonldquoLocking Firebird and the LockTablerdquo The article covers the lock-ing issue from general principles tospecific advice so everyone willfind a bit thats interesting or inform-ative
We continue the topic of savepointsinternals with an article ldquoTestingNO SAVEPOINT in InterBase751rdquo Along with interesting prac-tical test results you will find adescription of UNDO log workingsin InterBase 751
We publish the small chapterldquoInside BLOBsrdquo from the forthcom-ing book ldquo1000 InterBaseampFirebird
TipsampTricksrdquo by Dmitri Kouzmenkoand Alexey Kovyazin
Object oriented development hasbeen a high-interest topic for manyyears and it is a still hot topic Thearticle ldquoObject-Oriented Develop-ment in RDBMSrdquo explores the prac-tical use of OOD principles in thedesign of InterBase or Firebirddatabases
Replication is a problem whichfaces every database developersooner or later The article ldquoRepli-cating and synchronizing Inter-BaseFirebird databases usingCopyCatrdquo introduces the approachused in the new CopyCat compo-nent set and the CopyTiger tool
The last article by Dmitri Kouz-menko ldquoUnderstanding Your Data-base with IBAnalystrdquo provides aguide for better understanding ofhow databases work and how tun-ing their parameters can optimizeperformance and avoid bottle-necks
ldquoOldest Activerdquo column
I am very glad to introduce the new
but it is a considerable expense forits publisher We must pay for arti-cles editing and proofreadingdesign hosting etc In future wersquolltry to recover costs solely fromadvertising but for these initialissues we need your support
See details here
httpibdevelopercomdonations
On the Radar
Irsquod like to introduce the severalprojects which will be launched inthe near future We really needyour feedback so please do nothesitate to place your comments inblog or send email to us(readersibdevelopercom)
Magazine CD
We will issue a CD with all 3 issuesof ldquoThe InterBase and FirebirdDeveloper Magazinerdquo in Decem-ber 2005 (yes a Christmas CD)
Along with all issues in PDF andsearchable html-format you will findexclusive articles and bonus materi-als The CD will also include freeversions of popular software relat-
Editorrsquos note
4
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Rock around the blockFirebird conference
ed to InterBase and Firebird andoffer very attractive discounts fordozens of the most popular prod-ucts
The price for CD is USD$999 plusshipping
For details seewwwibdevelopercommagazine-cd
Paper versionof the magazine
In 2006 we intend to issue the firstpaper version of ldquoThe InterBaseand Firebird Developer Maga-zinerdquo The paper version cannot befree because of high productioncosts
We intend to publish 4 issues peryear with a subscription price ofaround USD$49 The issue volumewill be approximately 64 pages
In the paper version we plan toinclude the best articles from theonline issues and of course exclu-sive articles and materials
To be or not be ndash this is the questionthat only you can answer If theidea of subscribing to the paperversion appeals to you pleaseplace a comment in our blog(httpibdevelopercompaper-version) or send email toreadersibdevelopercom withyour pro and cons We look for-
This issue of our magazine almostcoincides with the third annualFirebird World Conference start-ing November 13 in PragueCzech Republic This years con-ference spans three nights with atight programme of back-to-backand parallel presentations overtwo days With speakers in atten-dance from all across Europe andthe Americas presenting topicsthat range from specialised appli-cation development techniques tofirst-hand accounts of Firebirdinternals from the core develop-ers this one promises to be thebest ever
Registration has been well underway for some weeks nowalthough the time is now past forearly bird discounts on registra-tion fees At the time of publish-ing bookings were still beingtaken for accommodation in theconference venue itself HotelOlsanka in downtown PragueThe hotel offers a variety of roomconfigurations including bed andbreakfast if wanted at modesttariffs Registration includes
FirebirdConference
lunches on both conference daysas well as mid-moring and mid-afternoon refreshments
Presentations include one on Fire-birds future development fromthe Firebird Project CoordinatorDmitry Yemanov and anotherfrom Alex Peshkov the architectof the security changes coming inFirebird 2 Jim Starkey will bethere talking about Firebird Vul-can and members of the SASInstitute team will talk aboutaspects of SASs taking on Fire-bird Vulcan as the back-end to itsillustrious statistical softwareMost of the interface develop-ment tools are on the menuincluding Oracle-mode FirebirdPHP Delphi Python and Java(Jaybird)
It is a dont miss for any Firebirddeveloper who can make it toPrague Link to details and pro-gramme either athttpfirebirdsourceforgenetindexphpop=konferenz or at theIBPhoenix websitehttpwwwibphoenixcom
ward for your feedback
Invitation
As a footnote Irsquod like to invite
all people who are in touch withFirebird and InterBase to partici-pate in the life of the communityThe importance of a large activecommunity to any public project --and a DBMS with hundreds of thou-sands of users is certainly public --cannot be emphasised enough It isthe key to survival for such projectsRead our magazine ask your ques-tions on forums and leave com-ments in blog just rock and roll withus
Sincerely
Alexey Kovyazin
Chief Editor
editoribdevelopercom
comingsoon
Oldest Active
5
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
On Silly Questionsand Idiotic Outcomes
Firebird owes its existence to the communitys support lists Though itsall ancient history now if it hadnt been for the esprit de corps among theregulars of the old InterBase support list hosted by merscom in thenineties Borlands shock-horror decision to kill off InterBase develop-ment at the end of 1999 would have been the end of it for the English-speaking users at least
Firebirds support lists on Yahoo and Sourceforge were born out of thesame cataclysm of fatal fire and regeneration that caused the phoenixdrawn from the fire against extreme adversity to take flight as Firebirdin July 2000 Existing and new users both of Firebird and of the richselection of drivers and tools that spun off from and for it depend heav-ily on the lists to get up to speed with the evolution of all of these soft-ware products
Newbies often observe that there is just nothing like our lists for anyother software be it open source or not One of the things that I thinkmakes the Firebird lists stand out from the crowd is that once gripped bythe power of the software our users never leave the lists Instead theystick around learn hard and stay ready and willing to impart to otherswhat they have learnt After nearly six years of this across a dozen listsour concentrated mix of expertise and fidelity is hard to beat
Now doesnt all this make us all feel warm and fuzzy For those of usin the hot core of this voluntary support infrastructure the answer has tobe unfortunately Not all the time There are some bugbears that canmake the most patient person see red Im using this column to drawattention to some of the worst
First and worst is silly questions You have all seen them Perhapsyou even posted them Subject Help Body Every time I try toconnect to Firebird I get the message Connection refused Whatam I doing wrong What follows from that is a frustrating andtedious game for responders What version of Firebird are youusing Which server model Which platform Which toolWhat connection path (By connection path we mean)
Everyones time is valuable Nobody has the luxury of beingable to sit around all day playing this game If we werent will-ing to help we wouldnt be there But you make it hard or even
impossible when you post prob-lems with no descriptions Youwaste our time You waste yourtime We get frustrated becauseyou dont provide the facts youget frustrated because we cantread your mind And the list getsfilled with noise
If theres something Ive learnt inmore than 12 years of on-linesoftware support it is that a prob-lem well described is a problemsolved If a person applies timeand thought to presenting a gooddescription of the problemchances are the solution will hopup and punch him on the noseEven if it doesnt good descrip-tions produce the fastest rightanswers from list contributorsFurthermore those good descrip-tions and their solutions form apowerful resource in the archivesfor others coming along behind
Off-topic questions are anothersource of noise and irritation in thelists At the Firebird website wehave bent over backwards tomake it very clear which lists areappropriate for which areas ofinterest An off-topic posting iseasily forgiven if the poster polite-ly complies with a request from alist regular to move the question tox list When these events insteadbecome flame threads or whenthe same people persistently reof-fend they are an extreme burdenon everyone
In the highly tedious categorywe have the kind of question thatgoes like this Subject Bug inFirebird SQL Body This state-ment works inMSSQLAccessMySQLinsertany non-standard DBMS namehere It doesnt work in FirebirdWhere should I report this bugTo be fair George Bernard Shawwasnt totally right when he saidIgnorance is a sin However
you do at least owe it to yourselfto know what you are talkingabout and not to present yourproblem as an attack on our soft-ware Whether intended or not itcomes over as a troll and youcome across as a fool Theres astrong chance that your attitudewill discourage anyone frombothering with you at all Its no-win sure but its also a reflectionof human nature You reap whatyou sow
In closing this little tirade I justwant to say that I hope Ive strucka chord with those of our reader-ship who struggle to get satisfac-tion from their list postings It willbe a rare question indeed that hasno answer Those rarities well-presented become excellent bugreports If youre not getting ananswer theres a high chance thatyou asked a silly question
Helen E M Borrie
helebortpgcomau
On Silly Questions and Idiotic Outcomes
Cover story
6
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Locking Firebird and the Lock Table
Firebird WorldwideConference 2005
The Firebird conferencewill take place
at the Hotel Olsankain Prague
Czech Republicfrom the evening
of Sunday the 13thof November
(opening session)until the evening
of Tuesday the 15thof November
(closing session)
News amp Events
One of the best-known facts aboutFirebird is that it uses multi-versionconcurrency control instead ofrecord locking For databases thatmanage concurrency thoughrecord locks understanding mecha-nisms of record locking is critical todesigning a high performance highconcurrency application In gener-al record locking is irrelevant toFirebird However Firebird useslocks to maintain internal consisten-cy and for interprocess coordina-tion Understanding how Firebirddoes (and does not) use locks helpswith system tuning and anticipatinghow Firebird will behave underload
Irsquom going to start by describinglocking abstractly then Firebirdlocking more specifically then thecontrols you have over the Firebirdlock table and how they can affectyour database performance So ifyou just want to learn about tuningskip to the end of the article
Concurrency control
Concurrency control in databasesystems has three basic functionspreventing concurrent transactionsfrom overwriting each othersrsquochanges preventing readers fromseeing uncommitted changes andgiving a consistent view of the data-
Locking Firebird and the Lock Tablebase to running transactions Thosethree functions can be implementedin various ways The simplest is toserialize transactions ndash allowingeach transaction exclusive access tothe database until it finishes Thatsolution is neither interesting nordesirable
A common solution is to allow eachtransaction to lock the data it useskeeping other transactions fromreading data it changes and fromchanging data it reads Moderndatabases generally lock recordsFirebird provides concurrency con-trol without record locks by keepingmultiple versions of records eachmarked with the identifier of thetransaction that created it Concur-rent transactions donrsquot overwriteeach otherrsquos changes because thesystem will not allow a transactionto change a record if the mostrecent version was created by aconcurrent transaction Readersnever see uncommitted databecause the system will not returnrecord versions created by concur-rent transactions Readers see aconsistent version of the databasebecause the system returns only thedata committed when they start ndashallowing other transactions to cre-ate newer versions Readers donrsquotblock writers
A lock sounds like a very solidobject but in database systems alock anything but solid ldquoLockingrdquo isa shorthand description of a proto-col for reserving resources Data-bases use locks to maintain consis-tency while allowing concurrentindependent transactions to updatedistinct parts of the database Eachtransaction reserves the resources ndashtables records data pages ndash that itneeds to do its work Typically thereservation is made in memory tocontrol a resource on disk so thecost of reserving the resource is notsignificant compared with the costof reading or changing theresource
Locking example
ldquoResourcerdquo is a very abstract termLets start by talking about lockingtables Firebird does lock tables butnormally it locks them only to pre-vent catastrophes like having onetransaction drop a table that anoth-er transaction is using
Before a Firebird transaction canaccess a table it must get a lock onthe table The lock prevents othertransactions from dropping thetable while it is in use When atransaction gets a lock on a tableFirebird makes an entry in its tableof locks indicating the identity of
the transaction that has the lock theidentity of the table being lockedand a function to call if anothertransaction wants an incompatiblelock on the table The normal tablelock is shared ndash other transactionscan lock the same table in the sameway
Before a transaction can delete atable it must get an exclusive lockon the table An exclusive lock isincompatible with other locks If anytransaction has a lock on the tablethe request for an exclusive lock isdenied the drop table statementfails Returning an immediate erroris one way of dealing with conflict-ing lock requests The other is to putthe conflicting request on a list ofunfulfilled requests and let it wait forthe resource to become available
In its simplest form that is how lock-ing works All transactions followthe formal protocol of requesting alock on a resource before using itFirebird maintains a list of lockedresources a list of requests for lockson resources ndash satisfied or waitingndash and a list of the owners of lockrequests When a transactionrequests a lock that is incompatiblewith existing locks on a resource
Author Ann W Harrisonaharrisonibphoenixcom
Requestfor Sponsors
How to Register
ConferenceTimetable
ConferencePapers
and Speaker
Cover story
7
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Locking Firebird and the Lock Table
Firebird either denies the newrequest or puts it on a list to waituntil the resource is available Inter-nal lock requests specify whetherthey wait or receive an immediateerror on a case-by-case basisWhen a transaction starts it speci-fies whether it will wait for locks thatit acquires on tables etc
Lock modes
For concurrency and read commit-ted transactions Firebird lockstables for shared read or sharedwrite Either mode says ldquoIrsquom usingthis table but you are free to use ittoordquo Consistency mode transac-tions follow different rules Theylock tables for protected read orprotected write Those modes sayldquoIrsquom using the table and no one elseis allowed to change it until Irsquomdonerdquo Protected read is compati-ble with shared read and other pro-tected read transactions Protectedwrite is only compatible with shareread
The important concept about lockmodes is that locks are more subtlethan mutexes ndash locks allowresource sharing as well as protect-ing resources from incompatibleuse
Two-phase locking vs tran-sient locking
The table locks that wersquove been
describing follow a protocol knownas two-phase locking which is typi-cal of locks taken by transactions indatabase systems Databases thatuse record locking for consistencycontrol always use two-phaserecord locks In two-phase lockinga transaction acquires locks as itproceeds and holds the locks until itends Once it releases any lock itcan no longer acquire another Thetwo phases are lock acquisition andlock release They cannot overlap
When a Firebird transaction readsa table it holds a lock on that tableuntil it ends When a concurrencytransaction has acquired a sharedwrite lock to update a table no con-sistency mode transaction will beable to get a protected lock on thattable until the transaction with theshared write lock ends and releasesits locks Table locking in Firebird istwo-phase locking
Locks can also be transient takenand released as necessary duringthe running of a transaction Fire-bird uses transient locking exten-sively to manage physical access tothe database
Firebird page locks
One major difference between Fire-bird and most other databases isFirebirdrsquos Classic mode In Classicmode many separate processesshare write access to a single data-
base file Most databases have asingle server process like Super-Server that has exclusive access tothe database and coordinatesphysical access to the file within
itself Firebird coordinates physicalaccess to the database throughlocks on database pages
In general database theory a trans-
Cover story
8
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Locking Firebird and the Lock Table
action is a set of steps that transformthe database from on consistentstate to another During that trans-formation the resources held by thetransaction must be protected fromincompatible changes by othertransactions Two-phase locks arethat protection
In Firebird internally each time atransaction changes a page itchanges that page ndash and the physi-cal structure of the database as awhole ndash from one consistent state toanother Before a transaction readsor writes a database page it locksthe page When it finishes readingor writing it can release the lockwithout compromising the physicalconsistency of the database fileFirebird page level locking is tran-sient Transactions acquire andrelease page locks throughout theirexistence However to preventdeadlocks transactions must beable to release all the page locks itholds before acquiring a lock on anew page
The Firebird lock table
When all access to a database isdone in a single process ndash as is thecase with most database systems ndashlocks are held in the serverrsquos memo-ry and the lock table is largely invis-ible The server process extends orremaps the lock information asrequired Firebird however man-ages its locks in a shared memory
section In SuperServer only theserver uses that share memoryarea In Classic every databaseconnection maps the shared memo-ry and every connection can readand change the contents of thememory
The lock table is a separate piece ofshare memory In SuperServer thelock table is mapped into the serverprocess In Classic each processmaps the lock table All databaseson a server computer share thesame lock table except those run-ning with the embedded server
The Firebird lock manager
We often talk about the FirebirdLock Manager as if it were a sepa-rate process but it isnrsquot The lockmanagement code is part of theengine just like the optimizer pars-er and expression evaluator Thereis a formal interface to the lockmanagement code which is similarto the formal interface to the distrib-uted lock manager that was part ofVAXVMS and one of the inter-faces to the Distributed Lock Man-ager from IBM
The lock manager is code in theengine In Classic each process hasits own lock manager When aClassic process requests or releasesa lock its lock management codeacquires a mutex on the sharedmemory section and changes the
state of the lock table to reflect itsrequest
Conflicting lock requests
When a request is made for a lock
on a resource that is already lockedin an incompatible mode one oftwo things happens Either therequesting transaction gets animmediate error or the request is
Registering for the Conference
Call for papers
Sponsoring the Firebird Conference
httpfirebird-conferencecom
Cover story
9
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Locking Firebird and the Lock Table
put on a list of waiting requests andthe transactions that hold conflictinglocks on the resource are notified ofthe conflicting request Part of everylock request is the address of a rou-tine to call when the lock interfereswith another request for a lock onthe same object Depending on theresource the routine may cause thelock to be released or require thenew request to wait
Transient locks like the locks ondatabase pages are releasedimmediately When a transactionrequests a page lock and that pageis already locked in an incompati-ble mode the transaction or trans-actions that hold the lock are noti-fied and must complete what theyare doing and release their locksimmediately Two-phase locks liketable locks are held until the trans-action that owns the lock completesWhen the conflicting lock isreleased and the new lock is grant-ed then transaction that had beenwaiting can proceed
Locks as interprocess com-munication
Lock management requires a highspeed completely reliable commu-nication mechanism between trans-actions including transactions indifferent processes The actualmechanism varies from platform toplatform but for the database to
work the mechanism must be fastand reliable A fast reliable inter-process communication mechanismcan be ndash and is ndash useful for a num-ber of purposes outside the areathatrsquos normally considered data-base locking
For example Firebird uses the locktable to notify running transactionsof the existence of a new index on atable Thatrsquos important since assoon as an index becomes activeevery transaction must help main-tain it ndash making new entries when itstores or modifies data removingentries when it modifies or deletesdata
When a transaction first referencesa table it gets a lock on the exis-tence of indexes for the tableWhen another transaction wants tocreate a new index on that table itmust get an exclusive lock on theexistence of indexes for the tableIts request conflicts with existinglocks and the owners of those locksare notified of the conflict Whenthose transactions are in a statewhere they can accept a new indexthey release their locks and imme-diately request new shared locks onthe existence of indexes for thetable The transaction that wants tocreate the index gets its exclusivelock creates the index and com-mits releasing its exclusive lock onthe existence of indexing As othertransactions get their new locks
they check the index definitions forthe table find the new index defini-tion and begin maintaining theindex
Firebird locking summary
Although Firebird does not lockrecords it uses locks extensively toisolate the effects of concurrenttransactions Locking and the locktable are more visible in Firebirdthan in other databases because thelock table is a central communica-tion channel between the separateprocesses that access the databasein Classic mode In addition to con-trolling access to database objectslike tables and data pages the Fire-bird lock manager allows differenttransactions and processes to notifyeach other of changes to the state ofthe database new indexes etc
Lock table specifics
The Firebird lock table is an in-mem-ory data area that contains of fourprimary types of blocks The lockheader block describes the locktable as a whole and containspointers to lists of other blocks andfree blocks Owner blocks describethe owners of lock requests ndash gen-erally lock owners are transactionsconnections or the SuperServerRequest blocks describe the rela-tionship between an owner and alockable resource ndash whether therequest is granted or pending the
mode of the request etc Lockblocks describe the resources beinglocked
To request a lock the owner findsthe lock block follows the linked listof requests for that lock and addsits request at the end If other own-ers must be notified of a conflictingrequest they are located throughthe request blocks already in the listEach owner block also has a list ofits own requests The performancecritical part of locking is finding lockblocks For that purpose the locktable includes a hash table foraccess to lock blocks based on thename of the resource being locked
A quick refresher on hashing
A hash table is an array with linkedlists of duplicates and collisionshanging from it The names of lock-able objects are transformed by afunction called the hash functioninto the offset of one of the elementsof the array When two namestransform to the same offset theresult is a collision When two lockshave the same name they areduplicates and always collide
In the Firebird lock table the arrayof the hash table contains theaddress of a hash block Hashblocks contain the original name acollision pointer a duplicate point-er and the address of the lock blockthat corresponds to the name Thecollision pointer contains the
Cover story
10
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Locking Firebird and the Lock Table
address of a hash block whosename hashed to the same valueThe duplicate pointer contains theaddress of a hash block that hasexactly the same name
A hash table is fast when there arerelatively few collisions With nocollisions finding a lock blockinvolved hashing the name index-ing into the array and reading thepointer from the first hash blockEach collision adds another pointerto follow and name to check Theratio of the size of the array to thenumber of locks determines thenumber of collisions Unfortunatelythe width of the array cannot beadjusted dynamically because thesize of the array is part of the hashfunction Changing the widthchanges the result of the functionand invalidates all existing entries inthe hash table
Adjusting the lock table toimprove performance
The size of the hash table is set inthe Firebird configuration file Youmust shut down all activity on alldatabases that share the hash tablendash normally all databases on themachine ndash before changes takeeffect The Classic architecture usesthe lock table more heavily thanSuperServer If you choose theClassic architecture you shouldcheck the load on the hash tableperiodically and increase the num-
ber of hash slots if the load is highThe symptom of an overloadedhash table is sluggish performanceunder load
The tool for checking the lock tableis fb_lock_print which is a com-mand line utility in the bin directoryof the Firebird installation tree Thefull lock print describes the entirestate of the lock table and is of limit-ed interest When your system isunder load and behaving badlyinvoke the utility with no options orswitches directing the output to afile Open the file with an editorYoull see output that starts some-thing like this
LOCK_HEADER BLOCK Version114 Active owner0 Length262144 Used85740 Semmask0x0 Flags 0x0001 Enqs 18512 Converts 490 Rejects0 Blocks 0 Deadlock scans0 Deadlocks0 Scan interval10 Acquires 21048 Acquire blocks0 Spin count0 Mutex wait103 Hash slots101 Hash lengths (minavgmax)3 15 30
hellipThe seventh and eighth lines suggestthat the hash table is too small andthat it is affecting system perform-ance In the example these valuesindicate a problem
Mutex wait 103 Hash slots 101 Hash lengths (minavgmax)3 15 30 In the Classic architecture eachprocess makes its own changes tothe lock table Only one process is
allowed to update the lock table atany instant When updating the locktable a process holds the tablersquosmutex A non-zero mutex wait indi-cates that processes are blocked bythe mutex and forced to wait foraccess to the lock table In turn thatindicates a performance probleminside the lock table typicallybecause looking up a lock is slow
If the hash lengths are more thanmin 5 avg 10 or max 30 youneed to increase the number ofhash slots The hash function used inFirebird is quick but not terribly effi-cient It works best if the number ofhash slots is prime
Change this line in the configurationfileLockHashSlots = 101
Uncomment the line by removing
the leading Choose a value thatis a prime number less than 2048
LockHashSlots = 499
The change will not take effect untilall connections to all databases onthe server machine shut down
If you increase the number of hashslots you should also increase thelock table size The second line ofthe lock print
Version114 Active owner 0Length 262144 Used 85740
tells you how close you are to run-ning out of space in the lock tableThe Version and Active owner areuninteresting The length is the max-imum size of the lock table Used isthe amount of space currently allo-cated for the various block typesand hash table If the amount usedis anywhere near the total lengthuncomment this parameter in theconfiguration file by removing theleading and increase the value
LockMemSize = 262144
to this
LockMemSize = 1048576
The value is bytes The default locktable is about a quarter of amegabyte which is insignificant onmodern computers Changing thelock table size will not take effectuntil all connections to all databaseson the server machine shut down
PHP Server
One of the more interest-ing recent developmentsin information technolo-gy has been the rise ofbrowser based applica-tions often referred toby the acronym LAMP
One key hurdle forbroad use of the LAMPtechnology for mid-mar-ket solutions was that itwas never easy to con-figure and manage
PHPServer changes thatit can be installed withjust four clicks of themouse and support forFirebird is compiled in
PHPServer shares all thequalities of Firebird it isa capable compacteasy to install and easyto manage solution
PHPServer is a freedownload
Read more at
wwwfyracleorgphpserverhtml
News amp Events
Server internals
11
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Inside BLOBs
How the serverworks with BLOBs
The BLOB data type is intended forstoring data of variable size Fieldsof BLOB type allow for storage ofdata that cannot be placed in fieldsof other types - for example pic-tures audio files video fragmentsetc
From the point view of the databaseapplication developer using BLOBfields is as transparent as it is forother field types (see chapter ldquoDatatypesrdquo for details) However thereis a significant difference betweenthe internal implementation mecha-nism for BLOBs and that for otherdata
Unlike the mechanism used for han-dling other types of fields the data-base engine uses a special mecha-nism to work with BLOB fields Thismechanism is transparently inte-grated with other record handlingat the application level and at thesame time has its own means ofpage organization Lets consider indetail how the BLOB-handlingmechanism works
Inside BLOBsThis is an excerpt from the book ldquo1000 InterBase amp Firebird Tips amp Tricksrdquoby Alexey Kovyazin and Dmitri Kouzmenko which will be published in 2006
Initially the basic record data onthe data page includes a referenceto a ldquoBLOB recordrdquo for each non-null BLOB field ie to record-likestructure or quasi-record that actu-
ally contains the BLOB dataDepending on the size of the BLOBthis BLOB-record will be one ofthree types
The first type is the simplest If thesize of BLOB-field data is less thanthe free space on the data page it isplaced on the data page as a sepa-rate record of BLOB type
Author Dmitri Kouzmenkokdvib-aidcom
Author Alexey Kovyazinakib-aidcom
Server internals
12
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Inside BLOBs
The second type is used when thesize of BLOB is greater than thefree space on the page In thiscase references to pages contain-ing the actual BLOB data arestored in a quasi-record Thus atwo-level structure of BLOB-fielddata is used
If the size of BLOB-field contents isvery large a three-level structureis used ndash a quasi-record stores ref-erences to BLOB pointer pageswhich contain references to theactual BLOB data
The whole structure of BLOB stor-age (except for the quasi-recordof course) is implemented by onepage type ndash the BLOB page typeDifferent types of BLOB-pages dif-fer from each other in the presenceof a flag (value 0 or 1) defininghow the server should interpret thegiven page
BLOB Page
The blob page consists of the fol-lowing parts
The special header contains thefollowing information
bullThe number of the first blob pagein this blob It is used to check thatpages belong to one blob
bullA sequence number This isimportant in checking the integrityof a BLOB For a BLOB pointerpage it is equal to zero
bullThe length of data on a page Asa page may or may not be filled tothe full extent the length of actualdata is indicated in the header
Maximum BLOB size
As the internal structure for storingBLOB data can have only 3 levelsof organization and the size ofdata page is also limited it is pos-sible to calculate the maximum sizeof a BLOB
However this is a theoretical limit(if you want you can calculate it)but in practice the limit will bemuch lower The reason for thislower limit is that the length ofBLOB-field data is determined by avariable of ULONG type ie itsmaximal size will be equal to 4gigabytes
Moreover in reality this practicallimit is reduced if a UDF is to beused for BLOB processing Aninternal UDF implementationassumes that the maximum BLOB
size will be 2 gigabytes So if youplan to have very large BLOBfields in your database you shouldexperiment with storing data of alarge size beforehand
The segment size mystery
Developers of database applica-tions often ask what the SegmentSize parameter in the definition ofa BLOB is why we need it andwhether or not we should set itwhen creating Blob-fields
In reality there is no need to setthis parameter Actually it is a bitof a relic used by the GPRE utilitywhen pre-processing EmbeddedSQL When working with BLOBsGPRE declares a buffer of speci-fied size based on the segmentsize Setting the segment size hasno influence over the allocationand the size of segments whenstoring the BLOB on disk It alsohas no influence on performanceTherefore the segment size can besafely set to any value but it is setto 80 bytes by default
Information for those who want toknow everything the number 80was chosen because 80 symbolscould be allocated in alphanumer-ic terminals
TestBed
13
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
In issue 1 we published DmitriYemanovrsquos article about the inter-nals of savepoints While that articlewas still on the desk Borlandannounced the release of InterBase751 introducing amongst otherthings a NO SAVEPOINT optionfor transaction management Is thisan important improvement for Inter-Base We decided to give thisimplementation a close look and testit some to discover what it is allabout
Testing NO SAVEPOINT
In order to analyze the problem thatthe new transaction option wasintended to address and to assessits real value we performed severalvery simple SQL tests The tests areall 100 reproducible so you willbe able to verify our results easily
Database for testing
The test database file was created inInterBase 751 page size = 4096character encoding is NONE Itcontains two tables one stored pro-cedure and three generators
For the test we will use only one
Testing the NO SAVEPOINTfeature in InterBase 751 Author Alexey Kovyazin
akib-aidcom
Author Vlad Horsunhvladuserssourceforgenet
table with the following structure
CREATE TABLE TEST (ID NUMERIC(182)NAME VARCHAR(120)DESCRIPTION VARCHAR(250)CNT INTEGERQRT DOUBLE PRECISIONTS_CHANGE TIMESTAMPTS_CREATE TIMESTAMPNOTES BLOB
)This table contains 100000 records which will be updated during the testThe stored procedure and generators are used to fill the table with test dataYou can increase the quantity of records in the test table by calling thestored procedure to insert them
SELECT FROM INSERTRECS(1000)
The second table TEST2DROP has the same structure as the first and is filledwith the same records as TESTINSERT INTO TEST2DROP SELECT FROM TESTAs you will see the second table will be dropped immediately after connectWe are just using it as a way to increase database size cheaply the pagesoccupied by the TEST2DROP table will be released for reuse after we dropthe table With this trick we avoid the impact of database file growth on thetest results
Setting test environment
All that is needed to perform this test is the trial installation package of Inter-Base 751 the test database and an SQL script
Download the InterBase 751 trialversion from wwwborlandcomThe installation process is obviousand well-described in the InterBasedocumentation
You can download a backup of thetest database ready to use fromhttpwwwibdevelopercomissue2testspbackupzip (~4 Mb) oralternatively an SQL script for cre-ating it fromhttpwwwibdevelopercomissue2testspdatabasescriptzip (~1Kb)
If you download the databasebackup the test tables are alreadypopulated with records and youcan proceed straight to the sectionldquoPreparing to testrdquo below
If you choose instead to use theSQL script you will create data-base yourself Make sure youinsert 100000 records into tableTEST using the INSERTRECS storedprocedure and then copy all ofthem to TEST2DROP three or fourtimes
After that perform a backup of thisdatabase and you will be on thesame position as if you had down-
TestBed
14
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
loaded ldquoready for userdquo backup
Hardware is not a material issue for these tests since we are only comparing performance with and without the NOSAVEPOINT option Our test platform was a modest computer with Pentium-4 2 GHz with 512 RAM and an 80GBSamsung HDD
Preparing to test
A separate copy of the test database is used for each test case in order to eliminate any interference between state-ments We create four fresh copies of the database for this purpose Supposing all files are in a directory calledCTEST simply create the four test databases from your test backup file
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp1ib
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp2ib
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp3ib
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp4ib
SQL test scripts
The first script tests a regular one-pass update without the NO SAVEPOINT option
For convenience the important commands are clarified with comments
connect CTESTtestsp1ib USER SYSDBA Password masterkey Connect drop table TEST2DROP Drop table TEST2DROP to free database pagescommitselect count() from test Walk down all records in TEST to place them into cachecommitset time on enable time statistics for performed statementsset stat on enables writesfetchesmemory statisticscommit perform bulk update of all records in TEST table update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommit
quit The second script tests performance for the same UPDATE with the NO SAVEPOINT option
TestBed
15
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
connect Ctestsp2ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitSET TRANSACTION NO SAVEPOINT enable NO SAVEPOINTupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit
Except for the inclusion of the SET TRANSACTION NO SAVEPOINT statement in the second script both scripts arethe same simply testing the behavior of engine in case of the single bulk UPDATE
To test sequential UPDATEs we added several UPDATE statements--we recommend using five The script for testing without NO SAVEPOINT would be
connect Etestsp3ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit
Fyracle 089
Janus has released anew version of Oracle-mode Firebird FyracleFyracle is a specialized
build of Firebird 15it adds temporary
tables hierarchicalqueries and
a PLSQL engine
Version 089 addssupport for storedprocedures written
in Java
Fyracle dramaticallyreduces the cost of port-ing Oracle-based appli-
cations to Firebird
Common usageincludes the
Compiere open sourceERP package
mid-market deploymentsof Developer
2000 applicationsand demo CDsof applications
without license trouble
Read more at
wwwjanus-soft-warecom
News amp Events
IBAnalyst 19
IBSurgeon has issued newversion of IBAnalyst
Now it can better analyzeInterBase or Firebird data-base statistics with usingmetadata information (in thiscase connection to databaseis required)
IBAnalyst is a tool that assistsa user to analyze in detailFirebird or InterBase data-base statistics and identifypossible problems with data-base performance mainte-nance and how an applica-tion interacts with the data-base
It graphically displays data-base statistics and can thenautomatically make intelli-gent suggestions aboutimproving database per-formance and databasemaintenance
wwwibsurgeoncomnewshtml
News amp Events
You can download all the scripts and the raw results of their execution from this locationhttpwwwibdevelopercomissue2testresultszip
TestBed
16
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
How to perform the test
The easiest way to perform the test is to use isqls INPUT command
Suppose you have the scripts located in ctestscripts
gtisqlUse CONNECT or CREATE DATABASE to specify a databaseSQLgtinput ctestscriptsscript01sql
Test resultsThe single bulk UPDATE
First letrsquos perform the test where the single-pass bulk UPDATE is performed
This is an excerpt from the one-pass script with default transaction settings
SQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10509940Delta memory = 214016Max memory = 10509940Elapsed time= 463 secBuffers = 2048Reads = 2111Writes 375Fetches = 974980
SQLgt commitCurrent memory = 10340752Delta memory = -169188Max memory = 10509940Elapsed time= 003 secBuffers = 2048Reads = 1Writes 942Fetches = 3
Fast Report 319
The new version of thefamous Fast Report isnow out FastReportreg 3is an add-in componentthat gives your applica-tions the ability to gener-ate reports quickly andefficiently FastReportregprovides all the tools youneed to develop reportsAll variants of FastRe-portreg 3 contains
Visual report designerwith rulers guides andzooming wizard forbasic types of reportexport filters for html tiffbmp jpg xls pdf out-puts Dot matrix reportssupport support for mostpopular DB-engines FullWYSIWYG text rotation0360 degrees memoobject supports simplehtml-tags (font color b iu sub sup) improvedstretching (StretchModeShiftMode properties)access to DB fieldsstyles text flow URLsAnchors
TrialBuy Now
News amp Events
TECT Softwarepresents
New versions of nice utilitiesfrom TECT Softwarewwwtectsoftnet
SPGen Stored Procedure Gen-erator for Firebird and Inter-
Base creates a standard set ofstored procs for any table full
details can be found here
httpwwwtectsoftnetProductsFirebird
FIBSPGenaspx
And FBMail Firebird EMail(FBMail) is a cross platform PHP
utility which easily allows thesending of emails direct from
within a database
This utility is specifically aimedat ISPs that support Firebirdbut can easily be used on anycomputer which is connected
to the internet
Full detailscan be found here
httpwwwtectsoftnetProductsFirebird
FBMailaspx
News amp Events
This is an excerpt from the one-pass script with NO SAVEPOINT enabled
CopyTiger 100
CopyTigera new productfrom Microtecis an advanced
databasereplication amp
synchronisationtool for
InterBaseFirebird
powered bytheir CopyCatcomponent set(see the articleabout CopyCat
in this issue)
For moreinformation about
CopyTiger seewwwmicrotecfr
copycatct
Downloadan evaluation
version
TestBed
17
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
SQLgt SET TRANSACTION NO SAVEPOINTSQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10352244Delta memory = 55296Max memory = 10352244Elapsed time= 472 secBuffers = 2048Reads = 2102Writes 350Fetches = 967154
SQLgt commitCurrent memory = 10344052Delta memory = -8192Max memory = 10352244Elapsed time= 015 secBuffers = 2048Reads = 1Writes 938Fetches = 3
Performance appears to be almost the same whether the NO SAVEPOINT option is enabled or not
Sequential bulk UPDATE statements
With the mutlti-pass script (sequential UPDATEs) the raw test results are rather large
Table 1
Test results for 5 sequentalUPDATEs
News amp Events
SQLHammer 15Enterprise Edition
The interestingdeveloper toolSQLHammer
now hasan Enterprise Edition
SQLHammer isa high-performance tool
for rapiddatabase development
and administration
It includes a commonintegrated development
environmentregistered custom
componentsbased on the
Borland Package Librarymechanism
and an Open Tools APIwritten in
DelphiObject Pascal
wwwmetadataforgecom
News amp Events
TestBed
18
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
For convenience the results are tabulated above
The first UPDATE statement has almost the same execution time with and without the NOSAVEPOINT option However memory consumption is reduced fivefold when we use NOSAVEPOINT
Figure 1
Time to perform UPDATEs with and without NO SAVEPOINT
Table 1
Test results for 5 sequental UPDATEs
In the second UPDATE we start to see the difference With default transaction settings thisUPDATE takes a very long time - 47 seconds - compared to only 7210 ms with NO SAVE-POINT enabled With default transaction settings we can see that memory usage is signifi-cant wherease with NO SAVEPOINT no additional memory is used
The third and all following UPDATE statements with default settings show equal time andmemory usage values and the growth of writes parameters
Figure 2
Memory usage while performing UPDATEs with and without NO SAVEPOINT
With NO SAVEPOINT usage we observe that timememory values and writes growth areall small and virtually equal for each pass The corresponding graphs are below
Inside the UNDO logSo what happened during the execution of the test scripts What is the secret behind thismagic that NO SAVEPOINT does Is there any cost attached to these gains
A few words about versions
You probably know already that InterBase is a multi-record-version database enginemeaning that each time a record is changed a new version of that record is produced The
TestBed
19
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
old version does not disappearimmediately but is retained as abackversion
In fact the first time a backversion iswritten to disk it is as a delta ver-sion which saves disk space andmemory usage by writing out onlythe differences between the old andthe new versions The engine canrebuild the full old version from thenew version and chains of delta ver-sions It is only if the same record isupdated more than once within thesame transaction that the full back-version is written to disk
The UNDO log concept
You may recall from Dmitrirsquos articlehow each transaction is implicitlyenclosed in a frame of savepointseach having its own undo log Thislog stores a record of each changein sequence ready for the possibili-ty that a rollback will be requested
A backversion materializes when-ever an UPDATE or DELETE state-ment is performed The engine hasto maintain all these backversions inthe undo log for the relevant save-point
So the Undo Log is a mechanism tomanage backversions for save-points in order to enable the associ-ated changes to be rolled back Theprocess of Undo logging is quitecomplex and maintaining it canconsume a lot of resources
The NO SAVEPOINToption
The NO SAVEPOINT option inInterBase 751 is a workaround forthe problem of performance lossduring bulk updates that do multiplepasses of a table The theory is ifusing the implicit savepoint man-agement causes problems then letskill the savepoints No savepoints ndashno problem -)
Besides ISQL it has been surfacedas a transaction parameter in bothDSQL and ESQL At the API level anew transaction parameter block(TPB) option isc_tpb_no_savepointcan be passed in the isc_start_trans-action() function call to disablesavepoints management Syntaxdetails for the latter flavors and forthe new tpb option can be found inthe 751 release notes
The effect of specifying the NOSAVEPOINT transaction parameteris that no undo log will be createdHowever along with the perform-ance gain for sequential bulkupdates it brings some costs fortransaction management
First and most obvious is that withNO SAVEPOINT enabled anyerror handling that relies on save-points is unavailable Any errorduring a NO SAVEPOINT transac-tion precludes all subsequent exe-cution and leads to rollback (see
ldquoRelease Notesrdquo for InterBase 75 SP1 ldquoNew in InterBase 751rdquo page 2-2)
Secondly when a NO SAVEPOINT transaction is rolled back it is marked as rolled back in the transactioninventory page Record version garbage thereby gets stuck in the interesting category and prevents theOIT from advancing Sweep is needed to advance the OIT and back out dead record versions
Fuller details of the NO SAVEPOINT option are provided in the InterBase 751 Release Notes
Initial situation
Consider the implementation details of the undo-log Figure 3 shows the initial situation
Recall that we perform this test on freshly-restored database so it is guaranteed that only one version existsfor any record
Figure 3
Initial situation before any UPDATE - only the one record version exists Undo log is empty
The first UPDATE
The first UPDATE statement creates delta backversions on disk (see figure 4) Since deltas store only thedifferences between the old and new versions they are quite small This operation is fast and it is easywork for the memory manager
It is simple to visualize the undo log when we perform the first UPDATEDELETE statement inside the trans-action ndash the engine just records the numbers of all affected records into the bitmap structure If it needs toroll back the changes associated with this savepoint it can read the stored numbers of the affected recordsthen walk down to the version of each and restore it from the updated version and the backversion storedon disk
TestBed
20
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
Figure 4
UPDATE1 create small delta version on disk and put record number into UNDO log
This approach is very fast and economical on memory usage The enginedoes not waste too many resources to handle this undo log ndash in fact it reusesthe existing multi-versioning mechanism Resource consumption is merelythe memory used to store the bitmap structure with the backversion num-bers We dont see any significant difference here between a transactionwith the default settings and one with the NO SAVEPOINT option enabled
The second UPDATE
When the second UPDATE statement is performed on the same set of records we have adifferent situation
Here is a good place to note that the example we are considering is the most simple situa-tion where only the one global (transaction level) savepoint exists We will also look at thedifference in the Undo log when an explicit (or enclosed BEGINhellip END) savepoint is used
To preserve on-disk integrity (remember the lsquocareful writersquo principle ) the engine mustcompute a new delta between the old version (by transaction1) and new version (by trans-action2 update2) store it somewhere on disk fetch the current full version (by transac-tion2 update1) put it into the in-memory undo-log replace it with the new full version (withbackpointers set to the newly created delta) and erase the old now superseded delta Asyou can see ndash there is much more work to do both on disk and in memory
The engine could write all intermediate versions to disk but there is no reason to do soThese versions are visible only to the modifying transaction and would not be used unlessa rollback was required
Figure 5
The second UPDATE creates a new delta backversion for transaction 1erases from disk the delta version created by the first UPDATE and copiesthe version from UPDATE1 into the Undo log
This all makes hard work for the memory manager and the CPU as you can see from thegrowth of the ldquomax memrdquordquodelta memrdquo parameters values in the test that uses the defaulttransaction parameters
When NO SAVEPOINT is enabled we avoid the expense of maintaining the Undo log Asa result we see execution time readswrites and memory usage as low for subsequentupdates as for the first
The third UPDATE
The third and all subsequent UPDATEs are similar to the second UPDATE with one excep-tion ndash memory usage does not grow any further
TestBed
21
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
Figure 6
The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2version and its own version is written to disk as the latest one
Figure 7
If we have an explicit SAVEPOINT each new record version associated with it will have acorresponding backversion in the Undo log of that savepoint
Original design of IB implement second UPDATE another way but sometime after IB6 Bor-land changed original behavior and we see what we see nowBut this theme is for anotherarticle)
Why is the delta of memory usage zero The reason is that beyond the second UPDATEno record version is created From here on the update just replaces record data on diskwith the newest one and shifts the superseded version into the Undo log
A more interesting question is why we see an increase in disk reads and writes during thetest We would have expected that the third and following UPDATEs would do essentiallyequal numbers of read and writes to write the newest versions and move the previous onesto the undo log However we are actually seeing a growing count of writes We have noanswer for it but we would be pleased to know
The following figure (figure 6) helps to illustrate the situation in the Undo log during thesequential updates When NO SAVEPOINT is enabled the only pieces we need to performare replacing the version on disk and updating the original backversion It is fast as the firstUPDATE
Explicit SAVEPOINTWhen an UPDATE statement is going to be performed within its own explicit or implicit
BEGIN END savepoint framework the engine has to store a backversion for each associ-ated record version in the Undo log
For example if we used an explicit savepoint eg SAVEPOINT Savepoint1 upon perform-ing UPDATE2 we would have the situation illustrated in figure 7
In this case the memory consumption would be expected to increase each time an UPDATEoccurs within the explicit savepoints scope
SummaryThe new transaction option NO SAVEPOINT can solve the problem of excessive resourceusage growth that can occur with sequential bulk updates It should be beneficial whenapplied appropriately Because the option can create its own problems by inhibiting theadvance of the OIT it should be used with caution of course The developer will need to takeextra care about database housekeeping particularly with respect to timely sweeping
Development area
22
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article
The sources in the bibliography arelisted in the order of their appear-ance in my mind
The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability
of all the methods taken from thearticles and of course I have testedall my own methods
Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise
I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc
And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle
The problem state-ment
What is the problem
Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common
1Everyone got used to them andfelt comfortable with them
2They are quite fast (if you usethem according to certain knownstandards)
3They use SQL which is an easycomprehensible and time-proveddata manipulation method
4They are based upon a strongmathematical theory
5They are convenient for applica-
tion development - if you developyour applications just like 20-30years ago
As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and
Author Vladimir Kotlyarevskyvladcontekru
Development area
23
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip
As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored
What do we need
The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing
RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task
The OID
All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it
First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have
any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database
OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex
Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)
Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world
And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones
Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles
The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers
ClassId
All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is
Development area
24
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain
Name Description cre-ation_date change_dateowner
Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect
links is often a very complicatedtask to put it mildly
Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later
The OBJECTS Table
Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure
So let us design this table
Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)
The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)
Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method
Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage
There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student
create domain TOID as integer not nullcreate table OBJECTS(
OID TOID primary keyClassId TOIDName varchar(32)
A simple plain dictionary can be retrieved from the OBJECTStable by the following query
select OID Name Description from OBJECTS where ClassId = LookupId
Development area
25
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description
from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)
Later I will demonstrate how to add alittle more intelligence to such simpledictionaries
Storing of more complexobjects
It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping
Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders
create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))
which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to
ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query
select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id
As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did
If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M
create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))
One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present
Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database
Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this
create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))
connected 1M with OBJECTS by OID There is also a table
create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))
which describes type metadata ndash an attribute set (their names and types) for eachobject type
All attributes for particular object where the OID is known are retrieved by the query
select attribute_id value from attributeswhere OID = oid
or with names of attributes
select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id
Development area
26
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries
select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id
Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query
Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the
database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base
The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1
Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage
Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB
It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-
utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation
To be continuedhellip
FastReport 3 - new generation of the reporting tools
Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix
reports support support most popular DB-engines
Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags
(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)
Access to DB fields styles text flow URLs Anchors
httpwwwfast-reportcom
Development area
27
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
Replicating and synchronizingInterBaseFirebirddatabases using CopyCat
AuthorJonathan Neve
jonathanmicrotecfr
wwwmicrotecfr
PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication
Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases
Data logging
Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)
Multi-node replication
Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after
In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it
replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)
When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it
Two-way replication
One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed
The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change
Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself
Primary key synchronization
One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-
Development area
28
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values
Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server
When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique
Conflict management
Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-
sion of the record this is a conflict
CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved
This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently
Difficult database structures
There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node
Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged
To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an
Development area
29
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable
PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms
1 As a set of Delphi C++ Builder components
2 As a standalone Win32 replication tool
We will now present each one of these products
1 CopyCat Delphi C++ Builder Component Set
CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions
Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server
Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table
Below is a concise guide for getting started with the CopyCat component suite
Download and install the evaluation components fromhttpwwwmicrotecfrcopycat
Prepare databases
1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components
2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)
3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo
4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base
5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo
Development area
30
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data
6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority
7) Apply all the generated SQL to all databases that should be replicated
8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)
Replicate
1 In Delphi open the ldquoReplicatorrdquo example project
2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property
3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases
4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)
5 Compile and run the example
6 Press the ldquoReplicate nowrdquo button
2 CopyTiger the CopyCat Win32standalone replication tool
For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER
Features include
bull Easy to use installer
bull Independent server Administrator tool (CTAdmin)
bull Configuration wizard for setting up links to master slave databases
bull Robust replication engine based on Microtec CopyCat
bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections
bull Simple amp intuitive control panel
bullAutomatic email notification on certain events (conflicts PK violations etc)
Visit the CopyTiger homepage to download a time-limited trial version
httpwwwmicrotecfrcopycatct
SUMMARY
In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit
1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications
2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure
CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool
You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr
As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold
Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code
ibd-852-150
Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems
Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases
Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the
Development area
31
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance
Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets
Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained
database statistics from time to time
Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and
indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports
The summary shown in Figure 1provides general information about
Figure 1 Summary of database statistics
Author Dmitri Kouzmenkokdvib-aidcom
1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction
Development area
32
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases
Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)
As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed
What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay
Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be
stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF
Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure
Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1
Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint
We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base
bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment
bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions
bull The Oldest active3 ndash the oldestcurrently active transaction
bull The Next transaction ndash thetransaction number that will beassigned to new transaction
bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell
if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources
bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset
As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem
It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless
Do not gather statistics if you
bull Just restored your database
bull Performed backup (gbak ndashb dbgdb) without the ndashg switch
bull Recently performed a manual sweep (gfix ndashsweep)
Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)
Development area
33
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw
The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data
For example look at ADDR_ADDRESS_IDX6 First of all the index
How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)
The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most
multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments
Table informationLets take look at another sampleoutput from IBAnalyst
The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics
ly and what the table size is inmegabytes Most of these warningsare customizable
In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records
Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics
were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could
bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway
Development area
34
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20
bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column
That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics
Reports
There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded
As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size
Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-
Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144
ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3
SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance
Readers feedback
35
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Comments to ldquoTemporary tablesrdquo
One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence
Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird
This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience
Then there are situations where theoptimizer simply loses the plot
because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans
Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem
The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp
and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample
The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)
We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is
dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably
Well to be fair Firebird developersare working on both topics
Volker Rehn
volkerrehnbigpondcom
We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it
Alexey Kovyazin Chief Editor
Readers feedback
Miscellaneous
36
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Invitations SubscriptionDonations Sponsorships
This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000
Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance
Subscribe now
To receive future issuesnotifications send email to
subscribeibdevelopercom
MagazineCD
Donations
Editorrsquos note
4
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Rock around the blockFirebird conference
ed to InterBase and Firebird andoffer very attractive discounts fordozens of the most popular prod-ucts
The price for CD is USD$999 plusshipping
For details seewwwibdevelopercommagazine-cd
Paper versionof the magazine
In 2006 we intend to issue the firstpaper version of ldquoThe InterBaseand Firebird Developer Maga-zinerdquo The paper version cannot befree because of high productioncosts
We intend to publish 4 issues peryear with a subscription price ofaround USD$49 The issue volumewill be approximately 64 pages
In the paper version we plan toinclude the best articles from theonline issues and of course exclu-sive articles and materials
To be or not be ndash this is the questionthat only you can answer If theidea of subscribing to the paperversion appeals to you pleaseplace a comment in our blog(httpibdevelopercompaper-version) or send email toreadersibdevelopercom withyour pro and cons We look for-
This issue of our magazine almostcoincides with the third annualFirebird World Conference start-ing November 13 in PragueCzech Republic This years con-ference spans three nights with atight programme of back-to-backand parallel presentations overtwo days With speakers in atten-dance from all across Europe andthe Americas presenting topicsthat range from specialised appli-cation development techniques tofirst-hand accounts of Firebirdinternals from the core develop-ers this one promises to be thebest ever
Registration has been well underway for some weeks nowalthough the time is now past forearly bird discounts on registra-tion fees At the time of publish-ing bookings were still beingtaken for accommodation in theconference venue itself HotelOlsanka in downtown PragueThe hotel offers a variety of roomconfigurations including bed andbreakfast if wanted at modesttariffs Registration includes
FirebirdConference
lunches on both conference daysas well as mid-moring and mid-afternoon refreshments
Presentations include one on Fire-birds future development fromthe Firebird Project CoordinatorDmitry Yemanov and anotherfrom Alex Peshkov the architectof the security changes coming inFirebird 2 Jim Starkey will bethere talking about Firebird Vul-can and members of the SASInstitute team will talk aboutaspects of SASs taking on Fire-bird Vulcan as the back-end to itsillustrious statistical softwareMost of the interface develop-ment tools are on the menuincluding Oracle-mode FirebirdPHP Delphi Python and Java(Jaybird)
It is a dont miss for any Firebirddeveloper who can make it toPrague Link to details and pro-gramme either athttpfirebirdsourceforgenetindexphpop=konferenz or at theIBPhoenix websitehttpwwwibphoenixcom
ward for your feedback
Invitation
As a footnote Irsquod like to invite
all people who are in touch withFirebird and InterBase to partici-pate in the life of the communityThe importance of a large activecommunity to any public project --and a DBMS with hundreds of thou-sands of users is certainly public --cannot be emphasised enough It isthe key to survival for such projectsRead our magazine ask your ques-tions on forums and leave com-ments in blog just rock and roll withus
Sincerely
Alexey Kovyazin
Chief Editor
editoribdevelopercom
comingsoon
Oldest Active
5
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
On Silly Questionsand Idiotic Outcomes
Firebird owes its existence to the communitys support lists Though itsall ancient history now if it hadnt been for the esprit de corps among theregulars of the old InterBase support list hosted by merscom in thenineties Borlands shock-horror decision to kill off InterBase develop-ment at the end of 1999 would have been the end of it for the English-speaking users at least
Firebirds support lists on Yahoo and Sourceforge were born out of thesame cataclysm of fatal fire and regeneration that caused the phoenixdrawn from the fire against extreme adversity to take flight as Firebirdin July 2000 Existing and new users both of Firebird and of the richselection of drivers and tools that spun off from and for it depend heav-ily on the lists to get up to speed with the evolution of all of these soft-ware products
Newbies often observe that there is just nothing like our lists for anyother software be it open source or not One of the things that I thinkmakes the Firebird lists stand out from the crowd is that once gripped bythe power of the software our users never leave the lists Instead theystick around learn hard and stay ready and willing to impart to otherswhat they have learnt After nearly six years of this across a dozen listsour concentrated mix of expertise and fidelity is hard to beat
Now doesnt all this make us all feel warm and fuzzy For those of usin the hot core of this voluntary support infrastructure the answer has tobe unfortunately Not all the time There are some bugbears that canmake the most patient person see red Im using this column to drawattention to some of the worst
First and worst is silly questions You have all seen them Perhapsyou even posted them Subject Help Body Every time I try toconnect to Firebird I get the message Connection refused Whatam I doing wrong What follows from that is a frustrating andtedious game for responders What version of Firebird are youusing Which server model Which platform Which toolWhat connection path (By connection path we mean)
Everyones time is valuable Nobody has the luxury of beingable to sit around all day playing this game If we werent will-ing to help we wouldnt be there But you make it hard or even
impossible when you post prob-lems with no descriptions Youwaste our time You waste yourtime We get frustrated becauseyou dont provide the facts youget frustrated because we cantread your mind And the list getsfilled with noise
If theres something Ive learnt inmore than 12 years of on-linesoftware support it is that a prob-lem well described is a problemsolved If a person applies timeand thought to presenting a gooddescription of the problemchances are the solution will hopup and punch him on the noseEven if it doesnt good descrip-tions produce the fastest rightanswers from list contributorsFurthermore those good descrip-tions and their solutions form apowerful resource in the archivesfor others coming along behind
Off-topic questions are anothersource of noise and irritation in thelists At the Firebird website wehave bent over backwards tomake it very clear which lists areappropriate for which areas ofinterest An off-topic posting iseasily forgiven if the poster polite-ly complies with a request from alist regular to move the question tox list When these events insteadbecome flame threads or whenthe same people persistently reof-fend they are an extreme burdenon everyone
In the highly tedious categorywe have the kind of question thatgoes like this Subject Bug inFirebird SQL Body This state-ment works inMSSQLAccessMySQLinsertany non-standard DBMS namehere It doesnt work in FirebirdWhere should I report this bugTo be fair George Bernard Shawwasnt totally right when he saidIgnorance is a sin However
you do at least owe it to yourselfto know what you are talkingabout and not to present yourproblem as an attack on our soft-ware Whether intended or not itcomes over as a troll and youcome across as a fool Theres astrong chance that your attitudewill discourage anyone frombothering with you at all Its no-win sure but its also a reflectionof human nature You reap whatyou sow
In closing this little tirade I justwant to say that I hope Ive strucka chord with those of our reader-ship who struggle to get satisfac-tion from their list postings It willbe a rare question indeed that hasno answer Those rarities well-presented become excellent bugreports If youre not getting ananswer theres a high chance thatyou asked a silly question
Helen E M Borrie
helebortpgcomau
On Silly Questions and Idiotic Outcomes
Cover story
6
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Locking Firebird and the Lock Table
Firebird WorldwideConference 2005
The Firebird conferencewill take place
at the Hotel Olsankain Prague
Czech Republicfrom the evening
of Sunday the 13thof November
(opening session)until the evening
of Tuesday the 15thof November
(closing session)
News amp Events
One of the best-known facts aboutFirebird is that it uses multi-versionconcurrency control instead ofrecord locking For databases thatmanage concurrency thoughrecord locks understanding mecha-nisms of record locking is critical todesigning a high performance highconcurrency application In gener-al record locking is irrelevant toFirebird However Firebird useslocks to maintain internal consisten-cy and for interprocess coordina-tion Understanding how Firebirddoes (and does not) use locks helpswith system tuning and anticipatinghow Firebird will behave underload
Irsquom going to start by describinglocking abstractly then Firebirdlocking more specifically then thecontrols you have over the Firebirdlock table and how they can affectyour database performance So ifyou just want to learn about tuningskip to the end of the article
Concurrency control
Concurrency control in databasesystems has three basic functionspreventing concurrent transactionsfrom overwriting each othersrsquochanges preventing readers fromseeing uncommitted changes andgiving a consistent view of the data-
Locking Firebird and the Lock Tablebase to running transactions Thosethree functions can be implementedin various ways The simplest is toserialize transactions ndash allowingeach transaction exclusive access tothe database until it finishes Thatsolution is neither interesting nordesirable
A common solution is to allow eachtransaction to lock the data it useskeeping other transactions fromreading data it changes and fromchanging data it reads Moderndatabases generally lock recordsFirebird provides concurrency con-trol without record locks by keepingmultiple versions of records eachmarked with the identifier of thetransaction that created it Concur-rent transactions donrsquot overwriteeach otherrsquos changes because thesystem will not allow a transactionto change a record if the mostrecent version was created by aconcurrent transaction Readersnever see uncommitted databecause the system will not returnrecord versions created by concur-rent transactions Readers see aconsistent version of the databasebecause the system returns only thedata committed when they start ndashallowing other transactions to cre-ate newer versions Readers donrsquotblock writers
A lock sounds like a very solidobject but in database systems alock anything but solid ldquoLockingrdquo isa shorthand description of a proto-col for reserving resources Data-bases use locks to maintain consis-tency while allowing concurrentindependent transactions to updatedistinct parts of the database Eachtransaction reserves the resources ndashtables records data pages ndash that itneeds to do its work Typically thereservation is made in memory tocontrol a resource on disk so thecost of reserving the resource is notsignificant compared with the costof reading or changing theresource
Locking example
ldquoResourcerdquo is a very abstract termLets start by talking about lockingtables Firebird does lock tables butnormally it locks them only to pre-vent catastrophes like having onetransaction drop a table that anoth-er transaction is using
Before a Firebird transaction canaccess a table it must get a lock onthe table The lock prevents othertransactions from dropping thetable while it is in use When atransaction gets a lock on a tableFirebird makes an entry in its tableof locks indicating the identity of
the transaction that has the lock theidentity of the table being lockedand a function to call if anothertransaction wants an incompatiblelock on the table The normal tablelock is shared ndash other transactionscan lock the same table in the sameway
Before a transaction can delete atable it must get an exclusive lockon the table An exclusive lock isincompatible with other locks If anytransaction has a lock on the tablethe request for an exclusive lock isdenied the drop table statementfails Returning an immediate erroris one way of dealing with conflict-ing lock requests The other is to putthe conflicting request on a list ofunfulfilled requests and let it wait forthe resource to become available
In its simplest form that is how lock-ing works All transactions followthe formal protocol of requesting alock on a resource before using itFirebird maintains a list of lockedresources a list of requests for lockson resources ndash satisfied or waitingndash and a list of the owners of lockrequests When a transactionrequests a lock that is incompatiblewith existing locks on a resource
Author Ann W Harrisonaharrisonibphoenixcom
Requestfor Sponsors
How to Register
ConferenceTimetable
ConferencePapers
and Speaker
Cover story
7
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Locking Firebird and the Lock Table
Firebird either denies the newrequest or puts it on a list to waituntil the resource is available Inter-nal lock requests specify whetherthey wait or receive an immediateerror on a case-by-case basisWhen a transaction starts it speci-fies whether it will wait for locks thatit acquires on tables etc
Lock modes
For concurrency and read commit-ted transactions Firebird lockstables for shared read or sharedwrite Either mode says ldquoIrsquom usingthis table but you are free to use ittoordquo Consistency mode transac-tions follow different rules Theylock tables for protected read orprotected write Those modes sayldquoIrsquom using the table and no one elseis allowed to change it until Irsquomdonerdquo Protected read is compati-ble with shared read and other pro-tected read transactions Protectedwrite is only compatible with shareread
The important concept about lockmodes is that locks are more subtlethan mutexes ndash locks allowresource sharing as well as protect-ing resources from incompatibleuse
Two-phase locking vs tran-sient locking
The table locks that wersquove been
describing follow a protocol knownas two-phase locking which is typi-cal of locks taken by transactions indatabase systems Databases thatuse record locking for consistencycontrol always use two-phaserecord locks In two-phase lockinga transaction acquires locks as itproceeds and holds the locks until itends Once it releases any lock itcan no longer acquire another Thetwo phases are lock acquisition andlock release They cannot overlap
When a Firebird transaction readsa table it holds a lock on that tableuntil it ends When a concurrencytransaction has acquired a sharedwrite lock to update a table no con-sistency mode transaction will beable to get a protected lock on thattable until the transaction with theshared write lock ends and releasesits locks Table locking in Firebird istwo-phase locking
Locks can also be transient takenand released as necessary duringthe running of a transaction Fire-bird uses transient locking exten-sively to manage physical access tothe database
Firebird page locks
One major difference between Fire-bird and most other databases isFirebirdrsquos Classic mode In Classicmode many separate processesshare write access to a single data-
base file Most databases have asingle server process like Super-Server that has exclusive access tothe database and coordinatesphysical access to the file within
itself Firebird coordinates physicalaccess to the database throughlocks on database pages
In general database theory a trans-
Cover story
8
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Locking Firebird and the Lock Table
action is a set of steps that transformthe database from on consistentstate to another During that trans-formation the resources held by thetransaction must be protected fromincompatible changes by othertransactions Two-phase locks arethat protection
In Firebird internally each time atransaction changes a page itchanges that page ndash and the physi-cal structure of the database as awhole ndash from one consistent state toanother Before a transaction readsor writes a database page it locksthe page When it finishes readingor writing it can release the lockwithout compromising the physicalconsistency of the database fileFirebird page level locking is tran-sient Transactions acquire andrelease page locks throughout theirexistence However to preventdeadlocks transactions must beable to release all the page locks itholds before acquiring a lock on anew page
The Firebird lock table
When all access to a database isdone in a single process ndash as is thecase with most database systems ndashlocks are held in the serverrsquos memo-ry and the lock table is largely invis-ible The server process extends orremaps the lock information asrequired Firebird however man-ages its locks in a shared memory
section In SuperServer only theserver uses that share memoryarea In Classic every databaseconnection maps the shared memo-ry and every connection can readand change the contents of thememory
The lock table is a separate piece ofshare memory In SuperServer thelock table is mapped into the serverprocess In Classic each processmaps the lock table All databaseson a server computer share thesame lock table except those run-ning with the embedded server
The Firebird lock manager
We often talk about the FirebirdLock Manager as if it were a sepa-rate process but it isnrsquot The lockmanagement code is part of theengine just like the optimizer pars-er and expression evaluator Thereis a formal interface to the lockmanagement code which is similarto the formal interface to the distrib-uted lock manager that was part ofVAXVMS and one of the inter-faces to the Distributed Lock Man-ager from IBM
The lock manager is code in theengine In Classic each process hasits own lock manager When aClassic process requests or releasesa lock its lock management codeacquires a mutex on the sharedmemory section and changes the
state of the lock table to reflect itsrequest
Conflicting lock requests
When a request is made for a lock
on a resource that is already lockedin an incompatible mode one oftwo things happens Either therequesting transaction gets animmediate error or the request is
Registering for the Conference
Call for papers
Sponsoring the Firebird Conference
httpfirebird-conferencecom
Cover story
9
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Locking Firebird and the Lock Table
put on a list of waiting requests andthe transactions that hold conflictinglocks on the resource are notified ofthe conflicting request Part of everylock request is the address of a rou-tine to call when the lock interfereswith another request for a lock onthe same object Depending on theresource the routine may cause thelock to be released or require thenew request to wait
Transient locks like the locks ondatabase pages are releasedimmediately When a transactionrequests a page lock and that pageis already locked in an incompati-ble mode the transaction or trans-actions that hold the lock are noti-fied and must complete what theyare doing and release their locksimmediately Two-phase locks liketable locks are held until the trans-action that owns the lock completesWhen the conflicting lock isreleased and the new lock is grant-ed then transaction that had beenwaiting can proceed
Locks as interprocess com-munication
Lock management requires a highspeed completely reliable commu-nication mechanism between trans-actions including transactions indifferent processes The actualmechanism varies from platform toplatform but for the database to
work the mechanism must be fastand reliable A fast reliable inter-process communication mechanismcan be ndash and is ndash useful for a num-ber of purposes outside the areathatrsquos normally considered data-base locking
For example Firebird uses the locktable to notify running transactionsof the existence of a new index on atable Thatrsquos important since assoon as an index becomes activeevery transaction must help main-tain it ndash making new entries when itstores or modifies data removingentries when it modifies or deletesdata
When a transaction first referencesa table it gets a lock on the exis-tence of indexes for the tableWhen another transaction wants tocreate a new index on that table itmust get an exclusive lock on theexistence of indexes for the tableIts request conflicts with existinglocks and the owners of those locksare notified of the conflict Whenthose transactions are in a statewhere they can accept a new indexthey release their locks and imme-diately request new shared locks onthe existence of indexes for thetable The transaction that wants tocreate the index gets its exclusivelock creates the index and com-mits releasing its exclusive lock onthe existence of indexing As othertransactions get their new locks
they check the index definitions forthe table find the new index defini-tion and begin maintaining theindex
Firebird locking summary
Although Firebird does not lockrecords it uses locks extensively toisolate the effects of concurrenttransactions Locking and the locktable are more visible in Firebirdthan in other databases because thelock table is a central communica-tion channel between the separateprocesses that access the databasein Classic mode In addition to con-trolling access to database objectslike tables and data pages the Fire-bird lock manager allows differenttransactions and processes to notifyeach other of changes to the state ofthe database new indexes etc
Lock table specifics
The Firebird lock table is an in-mem-ory data area that contains of fourprimary types of blocks The lockheader block describes the locktable as a whole and containspointers to lists of other blocks andfree blocks Owner blocks describethe owners of lock requests ndash gen-erally lock owners are transactionsconnections or the SuperServerRequest blocks describe the rela-tionship between an owner and alockable resource ndash whether therequest is granted or pending the
mode of the request etc Lockblocks describe the resources beinglocked
To request a lock the owner findsthe lock block follows the linked listof requests for that lock and addsits request at the end If other own-ers must be notified of a conflictingrequest they are located throughthe request blocks already in the listEach owner block also has a list ofits own requests The performancecritical part of locking is finding lockblocks For that purpose the locktable includes a hash table foraccess to lock blocks based on thename of the resource being locked
A quick refresher on hashing
A hash table is an array with linkedlists of duplicates and collisionshanging from it The names of lock-able objects are transformed by afunction called the hash functioninto the offset of one of the elementsof the array When two namestransform to the same offset theresult is a collision When two lockshave the same name they areduplicates and always collide
In the Firebird lock table the arrayof the hash table contains theaddress of a hash block Hashblocks contain the original name acollision pointer a duplicate point-er and the address of the lock blockthat corresponds to the name Thecollision pointer contains the
Cover story
10
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Locking Firebird and the Lock Table
address of a hash block whosename hashed to the same valueThe duplicate pointer contains theaddress of a hash block that hasexactly the same name
A hash table is fast when there arerelatively few collisions With nocollisions finding a lock blockinvolved hashing the name index-ing into the array and reading thepointer from the first hash blockEach collision adds another pointerto follow and name to check Theratio of the size of the array to thenumber of locks determines thenumber of collisions Unfortunatelythe width of the array cannot beadjusted dynamically because thesize of the array is part of the hashfunction Changing the widthchanges the result of the functionand invalidates all existing entries inthe hash table
Adjusting the lock table toimprove performance
The size of the hash table is set inthe Firebird configuration file Youmust shut down all activity on alldatabases that share the hash tablendash normally all databases on themachine ndash before changes takeeffect The Classic architecture usesthe lock table more heavily thanSuperServer If you choose theClassic architecture you shouldcheck the load on the hash tableperiodically and increase the num-
ber of hash slots if the load is highThe symptom of an overloadedhash table is sluggish performanceunder load
The tool for checking the lock tableis fb_lock_print which is a com-mand line utility in the bin directoryof the Firebird installation tree Thefull lock print describes the entirestate of the lock table and is of limit-ed interest When your system isunder load and behaving badlyinvoke the utility with no options orswitches directing the output to afile Open the file with an editorYoull see output that starts some-thing like this
LOCK_HEADER BLOCK Version114 Active owner0 Length262144 Used85740 Semmask0x0 Flags 0x0001 Enqs 18512 Converts 490 Rejects0 Blocks 0 Deadlock scans0 Deadlocks0 Scan interval10 Acquires 21048 Acquire blocks0 Spin count0 Mutex wait103 Hash slots101 Hash lengths (minavgmax)3 15 30
hellipThe seventh and eighth lines suggestthat the hash table is too small andthat it is affecting system perform-ance In the example these valuesindicate a problem
Mutex wait 103 Hash slots 101 Hash lengths (minavgmax)3 15 30 In the Classic architecture eachprocess makes its own changes tothe lock table Only one process is
allowed to update the lock table atany instant When updating the locktable a process holds the tablersquosmutex A non-zero mutex wait indi-cates that processes are blocked bythe mutex and forced to wait foraccess to the lock table In turn thatindicates a performance probleminside the lock table typicallybecause looking up a lock is slow
If the hash lengths are more thanmin 5 avg 10 or max 30 youneed to increase the number ofhash slots The hash function used inFirebird is quick but not terribly effi-cient It works best if the number ofhash slots is prime
Change this line in the configurationfileLockHashSlots = 101
Uncomment the line by removing
the leading Choose a value thatis a prime number less than 2048
LockHashSlots = 499
The change will not take effect untilall connections to all databases onthe server machine shut down
If you increase the number of hashslots you should also increase thelock table size The second line ofthe lock print
Version114 Active owner 0Length 262144 Used 85740
tells you how close you are to run-ning out of space in the lock tableThe Version and Active owner areuninteresting The length is the max-imum size of the lock table Used isthe amount of space currently allo-cated for the various block typesand hash table If the amount usedis anywhere near the total lengthuncomment this parameter in theconfiguration file by removing theleading and increase the value
LockMemSize = 262144
to this
LockMemSize = 1048576
The value is bytes The default locktable is about a quarter of amegabyte which is insignificant onmodern computers Changing thelock table size will not take effectuntil all connections to all databaseson the server machine shut down
PHP Server
One of the more interest-ing recent developmentsin information technolo-gy has been the rise ofbrowser based applica-tions often referred toby the acronym LAMP
One key hurdle forbroad use of the LAMPtechnology for mid-mar-ket solutions was that itwas never easy to con-figure and manage
PHPServer changes thatit can be installed withjust four clicks of themouse and support forFirebird is compiled in
PHPServer shares all thequalities of Firebird it isa capable compacteasy to install and easyto manage solution
PHPServer is a freedownload
Read more at
wwwfyracleorgphpserverhtml
News amp Events
Server internals
11
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Inside BLOBs
How the serverworks with BLOBs
The BLOB data type is intended forstoring data of variable size Fieldsof BLOB type allow for storage ofdata that cannot be placed in fieldsof other types - for example pic-tures audio files video fragmentsetc
From the point view of the databaseapplication developer using BLOBfields is as transparent as it is forother field types (see chapter ldquoDatatypesrdquo for details) However thereis a significant difference betweenthe internal implementation mecha-nism for BLOBs and that for otherdata
Unlike the mechanism used for han-dling other types of fields the data-base engine uses a special mecha-nism to work with BLOB fields Thismechanism is transparently inte-grated with other record handlingat the application level and at thesame time has its own means ofpage organization Lets consider indetail how the BLOB-handlingmechanism works
Inside BLOBsThis is an excerpt from the book ldquo1000 InterBase amp Firebird Tips amp Tricksrdquoby Alexey Kovyazin and Dmitri Kouzmenko which will be published in 2006
Initially the basic record data onthe data page includes a referenceto a ldquoBLOB recordrdquo for each non-null BLOB field ie to record-likestructure or quasi-record that actu-
ally contains the BLOB dataDepending on the size of the BLOBthis BLOB-record will be one ofthree types
The first type is the simplest If thesize of BLOB-field data is less thanthe free space on the data page it isplaced on the data page as a sepa-rate record of BLOB type
Author Dmitri Kouzmenkokdvib-aidcom
Author Alexey Kovyazinakib-aidcom
Server internals
12
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Inside BLOBs
The second type is used when thesize of BLOB is greater than thefree space on the page In thiscase references to pages contain-ing the actual BLOB data arestored in a quasi-record Thus atwo-level structure of BLOB-fielddata is used
If the size of BLOB-field contents isvery large a three-level structureis used ndash a quasi-record stores ref-erences to BLOB pointer pageswhich contain references to theactual BLOB data
The whole structure of BLOB stor-age (except for the quasi-recordof course) is implemented by onepage type ndash the BLOB page typeDifferent types of BLOB-pages dif-fer from each other in the presenceof a flag (value 0 or 1) defininghow the server should interpret thegiven page
BLOB Page
The blob page consists of the fol-lowing parts
The special header contains thefollowing information
bullThe number of the first blob pagein this blob It is used to check thatpages belong to one blob
bullA sequence number This isimportant in checking the integrityof a BLOB For a BLOB pointerpage it is equal to zero
bullThe length of data on a page Asa page may or may not be filled tothe full extent the length of actualdata is indicated in the header
Maximum BLOB size
As the internal structure for storingBLOB data can have only 3 levelsof organization and the size ofdata page is also limited it is pos-sible to calculate the maximum sizeof a BLOB
However this is a theoretical limit(if you want you can calculate it)but in practice the limit will bemuch lower The reason for thislower limit is that the length ofBLOB-field data is determined by avariable of ULONG type ie itsmaximal size will be equal to 4gigabytes
Moreover in reality this practicallimit is reduced if a UDF is to beused for BLOB processing Aninternal UDF implementationassumes that the maximum BLOB
size will be 2 gigabytes So if youplan to have very large BLOBfields in your database you shouldexperiment with storing data of alarge size beforehand
The segment size mystery
Developers of database applica-tions often ask what the SegmentSize parameter in the definition ofa BLOB is why we need it andwhether or not we should set itwhen creating Blob-fields
In reality there is no need to setthis parameter Actually it is a bitof a relic used by the GPRE utilitywhen pre-processing EmbeddedSQL When working with BLOBsGPRE declares a buffer of speci-fied size based on the segmentsize Setting the segment size hasno influence over the allocationand the size of segments whenstoring the BLOB on disk It alsohas no influence on performanceTherefore the segment size can besafely set to any value but it is setto 80 bytes by default
Information for those who want toknow everything the number 80was chosen because 80 symbolscould be allocated in alphanumer-ic terminals
TestBed
13
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
In issue 1 we published DmitriYemanovrsquos article about the inter-nals of savepoints While that articlewas still on the desk Borlandannounced the release of InterBase751 introducing amongst otherthings a NO SAVEPOINT optionfor transaction management Is thisan important improvement for Inter-Base We decided to give thisimplementation a close look and testit some to discover what it is allabout
Testing NO SAVEPOINT
In order to analyze the problem thatthe new transaction option wasintended to address and to assessits real value we performed severalvery simple SQL tests The tests areall 100 reproducible so you willbe able to verify our results easily
Database for testing
The test database file was created inInterBase 751 page size = 4096character encoding is NONE Itcontains two tables one stored pro-cedure and three generators
For the test we will use only one
Testing the NO SAVEPOINTfeature in InterBase 751 Author Alexey Kovyazin
akib-aidcom
Author Vlad Horsunhvladuserssourceforgenet
table with the following structure
CREATE TABLE TEST (ID NUMERIC(182)NAME VARCHAR(120)DESCRIPTION VARCHAR(250)CNT INTEGERQRT DOUBLE PRECISIONTS_CHANGE TIMESTAMPTS_CREATE TIMESTAMPNOTES BLOB
)This table contains 100000 records which will be updated during the testThe stored procedure and generators are used to fill the table with test dataYou can increase the quantity of records in the test table by calling thestored procedure to insert them
SELECT FROM INSERTRECS(1000)
The second table TEST2DROP has the same structure as the first and is filledwith the same records as TESTINSERT INTO TEST2DROP SELECT FROM TESTAs you will see the second table will be dropped immediately after connectWe are just using it as a way to increase database size cheaply the pagesoccupied by the TEST2DROP table will be released for reuse after we dropthe table With this trick we avoid the impact of database file growth on thetest results
Setting test environment
All that is needed to perform this test is the trial installation package of Inter-Base 751 the test database and an SQL script
Download the InterBase 751 trialversion from wwwborlandcomThe installation process is obviousand well-described in the InterBasedocumentation
You can download a backup of thetest database ready to use fromhttpwwwibdevelopercomissue2testspbackupzip (~4 Mb) oralternatively an SQL script for cre-ating it fromhttpwwwibdevelopercomissue2testspdatabasescriptzip (~1Kb)
If you download the databasebackup the test tables are alreadypopulated with records and youcan proceed straight to the sectionldquoPreparing to testrdquo below
If you choose instead to use theSQL script you will create data-base yourself Make sure youinsert 100000 records into tableTEST using the INSERTRECS storedprocedure and then copy all ofthem to TEST2DROP three or fourtimes
After that perform a backup of thisdatabase and you will be on thesame position as if you had down-
TestBed
14
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
loaded ldquoready for userdquo backup
Hardware is not a material issue for these tests since we are only comparing performance with and without the NOSAVEPOINT option Our test platform was a modest computer with Pentium-4 2 GHz with 512 RAM and an 80GBSamsung HDD
Preparing to test
A separate copy of the test database is used for each test case in order to eliminate any interference between state-ments We create four fresh copies of the database for this purpose Supposing all files are in a directory calledCTEST simply create the four test databases from your test backup file
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp1ib
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp2ib
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp3ib
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp4ib
SQL test scripts
The first script tests a regular one-pass update without the NO SAVEPOINT option
For convenience the important commands are clarified with comments
connect CTESTtestsp1ib USER SYSDBA Password masterkey Connect drop table TEST2DROP Drop table TEST2DROP to free database pagescommitselect count() from test Walk down all records in TEST to place them into cachecommitset time on enable time statistics for performed statementsset stat on enables writesfetchesmemory statisticscommit perform bulk update of all records in TEST table update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommit
quit The second script tests performance for the same UPDATE with the NO SAVEPOINT option
TestBed
15
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
connect Ctestsp2ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitSET TRANSACTION NO SAVEPOINT enable NO SAVEPOINTupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit
Except for the inclusion of the SET TRANSACTION NO SAVEPOINT statement in the second script both scripts arethe same simply testing the behavior of engine in case of the single bulk UPDATE
To test sequential UPDATEs we added several UPDATE statements--we recommend using five The script for testing without NO SAVEPOINT would be
connect Etestsp3ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit
Fyracle 089
Janus has released anew version of Oracle-mode Firebird FyracleFyracle is a specialized
build of Firebird 15it adds temporary
tables hierarchicalqueries and
a PLSQL engine
Version 089 addssupport for storedprocedures written
in Java
Fyracle dramaticallyreduces the cost of port-ing Oracle-based appli-
cations to Firebird
Common usageincludes the
Compiere open sourceERP package
mid-market deploymentsof Developer
2000 applicationsand demo CDsof applications
without license trouble
Read more at
wwwjanus-soft-warecom
News amp Events
IBAnalyst 19
IBSurgeon has issued newversion of IBAnalyst
Now it can better analyzeInterBase or Firebird data-base statistics with usingmetadata information (in thiscase connection to databaseis required)
IBAnalyst is a tool that assistsa user to analyze in detailFirebird or InterBase data-base statistics and identifypossible problems with data-base performance mainte-nance and how an applica-tion interacts with the data-base
It graphically displays data-base statistics and can thenautomatically make intelli-gent suggestions aboutimproving database per-formance and databasemaintenance
wwwibsurgeoncomnewshtml
News amp Events
You can download all the scripts and the raw results of their execution from this locationhttpwwwibdevelopercomissue2testresultszip
TestBed
16
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
How to perform the test
The easiest way to perform the test is to use isqls INPUT command
Suppose you have the scripts located in ctestscripts
gtisqlUse CONNECT or CREATE DATABASE to specify a databaseSQLgtinput ctestscriptsscript01sql
Test resultsThe single bulk UPDATE
First letrsquos perform the test where the single-pass bulk UPDATE is performed
This is an excerpt from the one-pass script with default transaction settings
SQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10509940Delta memory = 214016Max memory = 10509940Elapsed time= 463 secBuffers = 2048Reads = 2111Writes 375Fetches = 974980
SQLgt commitCurrent memory = 10340752Delta memory = -169188Max memory = 10509940Elapsed time= 003 secBuffers = 2048Reads = 1Writes 942Fetches = 3
Fast Report 319
The new version of thefamous Fast Report isnow out FastReportreg 3is an add-in componentthat gives your applica-tions the ability to gener-ate reports quickly andefficiently FastReportregprovides all the tools youneed to develop reportsAll variants of FastRe-portreg 3 contains
Visual report designerwith rulers guides andzooming wizard forbasic types of reportexport filters for html tiffbmp jpg xls pdf out-puts Dot matrix reportssupport support for mostpopular DB-engines FullWYSIWYG text rotation0360 degrees memoobject supports simplehtml-tags (font color b iu sub sup) improvedstretching (StretchModeShiftMode properties)access to DB fieldsstyles text flow URLsAnchors
TrialBuy Now
News amp Events
TECT Softwarepresents
New versions of nice utilitiesfrom TECT Softwarewwwtectsoftnet
SPGen Stored Procedure Gen-erator for Firebird and Inter-
Base creates a standard set ofstored procs for any table full
details can be found here
httpwwwtectsoftnetProductsFirebird
FIBSPGenaspx
And FBMail Firebird EMail(FBMail) is a cross platform PHP
utility which easily allows thesending of emails direct from
within a database
This utility is specifically aimedat ISPs that support Firebirdbut can easily be used on anycomputer which is connected
to the internet
Full detailscan be found here
httpwwwtectsoftnetProductsFirebird
FBMailaspx
News amp Events
This is an excerpt from the one-pass script with NO SAVEPOINT enabled
CopyTiger 100
CopyTigera new productfrom Microtecis an advanced
databasereplication amp
synchronisationtool for
InterBaseFirebird
powered bytheir CopyCatcomponent set(see the articleabout CopyCat
in this issue)
For moreinformation about
CopyTiger seewwwmicrotecfr
copycatct
Downloadan evaluation
version
TestBed
17
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
SQLgt SET TRANSACTION NO SAVEPOINTSQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10352244Delta memory = 55296Max memory = 10352244Elapsed time= 472 secBuffers = 2048Reads = 2102Writes 350Fetches = 967154
SQLgt commitCurrent memory = 10344052Delta memory = -8192Max memory = 10352244Elapsed time= 015 secBuffers = 2048Reads = 1Writes 938Fetches = 3
Performance appears to be almost the same whether the NO SAVEPOINT option is enabled or not
Sequential bulk UPDATE statements
With the mutlti-pass script (sequential UPDATEs) the raw test results are rather large
Table 1
Test results for 5 sequentalUPDATEs
News amp Events
SQLHammer 15Enterprise Edition
The interestingdeveloper toolSQLHammer
now hasan Enterprise Edition
SQLHammer isa high-performance tool
for rapiddatabase development
and administration
It includes a commonintegrated development
environmentregistered custom
componentsbased on the
Borland Package Librarymechanism
and an Open Tools APIwritten in
DelphiObject Pascal
wwwmetadataforgecom
News amp Events
TestBed
18
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
For convenience the results are tabulated above
The first UPDATE statement has almost the same execution time with and without the NOSAVEPOINT option However memory consumption is reduced fivefold when we use NOSAVEPOINT
Figure 1
Time to perform UPDATEs with and without NO SAVEPOINT
Table 1
Test results for 5 sequental UPDATEs
In the second UPDATE we start to see the difference With default transaction settings thisUPDATE takes a very long time - 47 seconds - compared to only 7210 ms with NO SAVE-POINT enabled With default transaction settings we can see that memory usage is signifi-cant wherease with NO SAVEPOINT no additional memory is used
The third and all following UPDATE statements with default settings show equal time andmemory usage values and the growth of writes parameters
Figure 2
Memory usage while performing UPDATEs with and without NO SAVEPOINT
With NO SAVEPOINT usage we observe that timememory values and writes growth areall small and virtually equal for each pass The corresponding graphs are below
Inside the UNDO logSo what happened during the execution of the test scripts What is the secret behind thismagic that NO SAVEPOINT does Is there any cost attached to these gains
A few words about versions
You probably know already that InterBase is a multi-record-version database enginemeaning that each time a record is changed a new version of that record is produced The
TestBed
19
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
old version does not disappearimmediately but is retained as abackversion
In fact the first time a backversion iswritten to disk it is as a delta ver-sion which saves disk space andmemory usage by writing out onlythe differences between the old andthe new versions The engine canrebuild the full old version from thenew version and chains of delta ver-sions It is only if the same record isupdated more than once within thesame transaction that the full back-version is written to disk
The UNDO log concept
You may recall from Dmitrirsquos articlehow each transaction is implicitlyenclosed in a frame of savepointseach having its own undo log Thislog stores a record of each changein sequence ready for the possibili-ty that a rollback will be requested
A backversion materializes when-ever an UPDATE or DELETE state-ment is performed The engine hasto maintain all these backversions inthe undo log for the relevant save-point
So the Undo Log is a mechanism tomanage backversions for save-points in order to enable the associ-ated changes to be rolled back Theprocess of Undo logging is quitecomplex and maintaining it canconsume a lot of resources
The NO SAVEPOINToption
The NO SAVEPOINT option inInterBase 751 is a workaround forthe problem of performance lossduring bulk updates that do multiplepasses of a table The theory is ifusing the implicit savepoint man-agement causes problems then letskill the savepoints No savepoints ndashno problem -)
Besides ISQL it has been surfacedas a transaction parameter in bothDSQL and ESQL At the API level anew transaction parameter block(TPB) option isc_tpb_no_savepointcan be passed in the isc_start_trans-action() function call to disablesavepoints management Syntaxdetails for the latter flavors and forthe new tpb option can be found inthe 751 release notes
The effect of specifying the NOSAVEPOINT transaction parameteris that no undo log will be createdHowever along with the perform-ance gain for sequential bulkupdates it brings some costs fortransaction management
First and most obvious is that withNO SAVEPOINT enabled anyerror handling that relies on save-points is unavailable Any errorduring a NO SAVEPOINT transac-tion precludes all subsequent exe-cution and leads to rollback (see
ldquoRelease Notesrdquo for InterBase 75 SP1 ldquoNew in InterBase 751rdquo page 2-2)
Secondly when a NO SAVEPOINT transaction is rolled back it is marked as rolled back in the transactioninventory page Record version garbage thereby gets stuck in the interesting category and prevents theOIT from advancing Sweep is needed to advance the OIT and back out dead record versions
Fuller details of the NO SAVEPOINT option are provided in the InterBase 751 Release Notes
Initial situation
Consider the implementation details of the undo-log Figure 3 shows the initial situation
Recall that we perform this test on freshly-restored database so it is guaranteed that only one version existsfor any record
Figure 3
Initial situation before any UPDATE - only the one record version exists Undo log is empty
The first UPDATE
The first UPDATE statement creates delta backversions on disk (see figure 4) Since deltas store only thedifferences between the old and new versions they are quite small This operation is fast and it is easywork for the memory manager
It is simple to visualize the undo log when we perform the first UPDATEDELETE statement inside the trans-action ndash the engine just records the numbers of all affected records into the bitmap structure If it needs toroll back the changes associated with this savepoint it can read the stored numbers of the affected recordsthen walk down to the version of each and restore it from the updated version and the backversion storedon disk
TestBed
20
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
Figure 4
UPDATE1 create small delta version on disk and put record number into UNDO log
This approach is very fast and economical on memory usage The enginedoes not waste too many resources to handle this undo log ndash in fact it reusesthe existing multi-versioning mechanism Resource consumption is merelythe memory used to store the bitmap structure with the backversion num-bers We dont see any significant difference here between a transactionwith the default settings and one with the NO SAVEPOINT option enabled
The second UPDATE
When the second UPDATE statement is performed on the same set of records we have adifferent situation
Here is a good place to note that the example we are considering is the most simple situa-tion where only the one global (transaction level) savepoint exists We will also look at thedifference in the Undo log when an explicit (or enclosed BEGINhellip END) savepoint is used
To preserve on-disk integrity (remember the lsquocareful writersquo principle ) the engine mustcompute a new delta between the old version (by transaction1) and new version (by trans-action2 update2) store it somewhere on disk fetch the current full version (by transac-tion2 update1) put it into the in-memory undo-log replace it with the new full version (withbackpointers set to the newly created delta) and erase the old now superseded delta Asyou can see ndash there is much more work to do both on disk and in memory
The engine could write all intermediate versions to disk but there is no reason to do soThese versions are visible only to the modifying transaction and would not be used unlessa rollback was required
Figure 5
The second UPDATE creates a new delta backversion for transaction 1erases from disk the delta version created by the first UPDATE and copiesthe version from UPDATE1 into the Undo log
This all makes hard work for the memory manager and the CPU as you can see from thegrowth of the ldquomax memrdquordquodelta memrdquo parameters values in the test that uses the defaulttransaction parameters
When NO SAVEPOINT is enabled we avoid the expense of maintaining the Undo log Asa result we see execution time readswrites and memory usage as low for subsequentupdates as for the first
The third UPDATE
The third and all subsequent UPDATEs are similar to the second UPDATE with one excep-tion ndash memory usage does not grow any further
TestBed
21
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
Figure 6
The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2version and its own version is written to disk as the latest one
Figure 7
If we have an explicit SAVEPOINT each new record version associated with it will have acorresponding backversion in the Undo log of that savepoint
Original design of IB implement second UPDATE another way but sometime after IB6 Bor-land changed original behavior and we see what we see nowBut this theme is for anotherarticle)
Why is the delta of memory usage zero The reason is that beyond the second UPDATEno record version is created From here on the update just replaces record data on diskwith the newest one and shifts the superseded version into the Undo log
A more interesting question is why we see an increase in disk reads and writes during thetest We would have expected that the third and following UPDATEs would do essentiallyequal numbers of read and writes to write the newest versions and move the previous onesto the undo log However we are actually seeing a growing count of writes We have noanswer for it but we would be pleased to know
The following figure (figure 6) helps to illustrate the situation in the Undo log during thesequential updates When NO SAVEPOINT is enabled the only pieces we need to performare replacing the version on disk and updating the original backversion It is fast as the firstUPDATE
Explicit SAVEPOINTWhen an UPDATE statement is going to be performed within its own explicit or implicit
BEGIN END savepoint framework the engine has to store a backversion for each associ-ated record version in the Undo log
For example if we used an explicit savepoint eg SAVEPOINT Savepoint1 upon perform-ing UPDATE2 we would have the situation illustrated in figure 7
In this case the memory consumption would be expected to increase each time an UPDATEoccurs within the explicit savepoints scope
SummaryThe new transaction option NO SAVEPOINT can solve the problem of excessive resourceusage growth that can occur with sequential bulk updates It should be beneficial whenapplied appropriately Because the option can create its own problems by inhibiting theadvance of the OIT it should be used with caution of course The developer will need to takeextra care about database housekeeping particularly with respect to timely sweeping
Development area
22
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article
The sources in the bibliography arelisted in the order of their appear-ance in my mind
The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability
of all the methods taken from thearticles and of course I have testedall my own methods
Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise
I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc
And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle
The problem state-ment
What is the problem
Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common
1Everyone got used to them andfelt comfortable with them
2They are quite fast (if you usethem according to certain knownstandards)
3They use SQL which is an easycomprehensible and time-proveddata manipulation method
4They are based upon a strongmathematical theory
5They are convenient for applica-
tion development - if you developyour applications just like 20-30years ago
As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and
Author Vladimir Kotlyarevskyvladcontekru
Development area
23
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip
As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored
What do we need
The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing
RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task
The OID
All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it
First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have
any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database
OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex
Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)
Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world
And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones
Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles
The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers
ClassId
All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is
Development area
24
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain
Name Description cre-ation_date change_dateowner
Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect
links is often a very complicatedtask to put it mildly
Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later
The OBJECTS Table
Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure
So let us design this table
Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)
The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)
Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method
Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage
There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student
create domain TOID as integer not nullcreate table OBJECTS(
OID TOID primary keyClassId TOIDName varchar(32)
A simple plain dictionary can be retrieved from the OBJECTStable by the following query
select OID Name Description from OBJECTS where ClassId = LookupId
Development area
25
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description
from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)
Later I will demonstrate how to add alittle more intelligence to such simpledictionaries
Storing of more complexobjects
It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping
Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders
create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))
which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to
ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query
select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id
As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did
If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M
create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))
One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present
Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database
Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this
create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))
connected 1M with OBJECTS by OID There is also a table
create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))
which describes type metadata ndash an attribute set (their names and types) for eachobject type
All attributes for particular object where the OID is known are retrieved by the query
select attribute_id value from attributeswhere OID = oid
or with names of attributes
select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id
Development area
26
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries
select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id
Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query
Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the
database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base
The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1
Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage
Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB
It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-
utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation
To be continuedhellip
FastReport 3 - new generation of the reporting tools
Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix
reports support support most popular DB-engines
Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags
(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)
Access to DB fields styles text flow URLs Anchors
httpwwwfast-reportcom
Development area
27
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
Replicating and synchronizingInterBaseFirebirddatabases using CopyCat
AuthorJonathan Neve
jonathanmicrotecfr
wwwmicrotecfr
PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication
Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases
Data logging
Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)
Multi-node replication
Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after
In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it
replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)
When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it
Two-way replication
One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed
The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change
Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself
Primary key synchronization
One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-
Development area
28
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values
Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server
When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique
Conflict management
Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-
sion of the record this is a conflict
CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved
This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently
Difficult database structures
There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node
Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged
To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an
Development area
29
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable
PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms
1 As a set of Delphi C++ Builder components
2 As a standalone Win32 replication tool
We will now present each one of these products
1 CopyCat Delphi C++ Builder Component Set
CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions
Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server
Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table
Below is a concise guide for getting started with the CopyCat component suite
Download and install the evaluation components fromhttpwwwmicrotecfrcopycat
Prepare databases
1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components
2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)
3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo
4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base
5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo
Development area
30
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data
6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority
7) Apply all the generated SQL to all databases that should be replicated
8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)
Replicate
1 In Delphi open the ldquoReplicatorrdquo example project
2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property
3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases
4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)
5 Compile and run the example
6 Press the ldquoReplicate nowrdquo button
2 CopyTiger the CopyCat Win32standalone replication tool
For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER
Features include
bull Easy to use installer
bull Independent server Administrator tool (CTAdmin)
bull Configuration wizard for setting up links to master slave databases
bull Robust replication engine based on Microtec CopyCat
bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections
bull Simple amp intuitive control panel
bullAutomatic email notification on certain events (conflicts PK violations etc)
Visit the CopyTiger homepage to download a time-limited trial version
httpwwwmicrotecfrcopycatct
SUMMARY
In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit
1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications
2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure
CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool
You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr
As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold
Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code
ibd-852-150
Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems
Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases
Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the
Development area
31
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance
Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets
Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained
database statistics from time to time
Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and
indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports
The summary shown in Figure 1provides general information about
Figure 1 Summary of database statistics
Author Dmitri Kouzmenkokdvib-aidcom
1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction
Development area
32
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases
Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)
As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed
What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay
Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be
stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF
Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure
Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1
Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint
We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base
bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment
bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions
bull The Oldest active3 ndash the oldestcurrently active transaction
bull The Next transaction ndash thetransaction number that will beassigned to new transaction
bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell
if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources
bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset
As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem
It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless
Do not gather statistics if you
bull Just restored your database
bull Performed backup (gbak ndashb dbgdb) without the ndashg switch
bull Recently performed a manual sweep (gfix ndashsweep)
Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)
Development area
33
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw
The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data
For example look at ADDR_ADDRESS_IDX6 First of all the index
How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)
The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most
multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments
Table informationLets take look at another sampleoutput from IBAnalyst
The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics
ly and what the table size is inmegabytes Most of these warningsare customizable
In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records
Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics
were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could
bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway
Development area
34
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20
bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column
That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics
Reports
There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded
As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size
Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-
Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144
ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3
SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance
Readers feedback
35
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Comments to ldquoTemporary tablesrdquo
One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence
Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird
This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience
Then there are situations where theoptimizer simply loses the plot
because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans
Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem
The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp
and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample
The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)
We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is
dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably
Well to be fair Firebird developersare working on both topics
Volker Rehn
volkerrehnbigpondcom
We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it
Alexey Kovyazin Chief Editor
Readers feedback
Miscellaneous
36
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Invitations SubscriptionDonations Sponsorships
This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000
Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance
Subscribe now
To receive future issuesnotifications send email to
subscribeibdevelopercom
MagazineCD
Donations
Oldest Active
5
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
On Silly Questionsand Idiotic Outcomes
Firebird owes its existence to the communitys support lists Though itsall ancient history now if it hadnt been for the esprit de corps among theregulars of the old InterBase support list hosted by merscom in thenineties Borlands shock-horror decision to kill off InterBase develop-ment at the end of 1999 would have been the end of it for the English-speaking users at least
Firebirds support lists on Yahoo and Sourceforge were born out of thesame cataclysm of fatal fire and regeneration that caused the phoenixdrawn from the fire against extreme adversity to take flight as Firebirdin July 2000 Existing and new users both of Firebird and of the richselection of drivers and tools that spun off from and for it depend heav-ily on the lists to get up to speed with the evolution of all of these soft-ware products
Newbies often observe that there is just nothing like our lists for anyother software be it open source or not One of the things that I thinkmakes the Firebird lists stand out from the crowd is that once gripped bythe power of the software our users never leave the lists Instead theystick around learn hard and stay ready and willing to impart to otherswhat they have learnt After nearly six years of this across a dozen listsour concentrated mix of expertise and fidelity is hard to beat
Now doesnt all this make us all feel warm and fuzzy For those of usin the hot core of this voluntary support infrastructure the answer has tobe unfortunately Not all the time There are some bugbears that canmake the most patient person see red Im using this column to drawattention to some of the worst
First and worst is silly questions You have all seen them Perhapsyou even posted them Subject Help Body Every time I try toconnect to Firebird I get the message Connection refused Whatam I doing wrong What follows from that is a frustrating andtedious game for responders What version of Firebird are youusing Which server model Which platform Which toolWhat connection path (By connection path we mean)
Everyones time is valuable Nobody has the luxury of beingable to sit around all day playing this game If we werent will-ing to help we wouldnt be there But you make it hard or even
impossible when you post prob-lems with no descriptions Youwaste our time You waste yourtime We get frustrated becauseyou dont provide the facts youget frustrated because we cantread your mind And the list getsfilled with noise
If theres something Ive learnt inmore than 12 years of on-linesoftware support it is that a prob-lem well described is a problemsolved If a person applies timeand thought to presenting a gooddescription of the problemchances are the solution will hopup and punch him on the noseEven if it doesnt good descrip-tions produce the fastest rightanswers from list contributorsFurthermore those good descrip-tions and their solutions form apowerful resource in the archivesfor others coming along behind
Off-topic questions are anothersource of noise and irritation in thelists At the Firebird website wehave bent over backwards tomake it very clear which lists areappropriate for which areas ofinterest An off-topic posting iseasily forgiven if the poster polite-ly complies with a request from alist regular to move the question tox list When these events insteadbecome flame threads or whenthe same people persistently reof-fend they are an extreme burdenon everyone
In the highly tedious categorywe have the kind of question thatgoes like this Subject Bug inFirebird SQL Body This state-ment works inMSSQLAccessMySQLinsertany non-standard DBMS namehere It doesnt work in FirebirdWhere should I report this bugTo be fair George Bernard Shawwasnt totally right when he saidIgnorance is a sin However
you do at least owe it to yourselfto know what you are talkingabout and not to present yourproblem as an attack on our soft-ware Whether intended or not itcomes over as a troll and youcome across as a fool Theres astrong chance that your attitudewill discourage anyone frombothering with you at all Its no-win sure but its also a reflectionof human nature You reap whatyou sow
In closing this little tirade I justwant to say that I hope Ive strucka chord with those of our reader-ship who struggle to get satisfac-tion from their list postings It willbe a rare question indeed that hasno answer Those rarities well-presented become excellent bugreports If youre not getting ananswer theres a high chance thatyou asked a silly question
Helen E M Borrie
helebortpgcomau
On Silly Questions and Idiotic Outcomes
Cover story
6
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Locking Firebird and the Lock Table
Firebird WorldwideConference 2005
The Firebird conferencewill take place
at the Hotel Olsankain Prague
Czech Republicfrom the evening
of Sunday the 13thof November
(opening session)until the evening
of Tuesday the 15thof November
(closing session)
News amp Events
One of the best-known facts aboutFirebird is that it uses multi-versionconcurrency control instead ofrecord locking For databases thatmanage concurrency thoughrecord locks understanding mecha-nisms of record locking is critical todesigning a high performance highconcurrency application In gener-al record locking is irrelevant toFirebird However Firebird useslocks to maintain internal consisten-cy and for interprocess coordina-tion Understanding how Firebirddoes (and does not) use locks helpswith system tuning and anticipatinghow Firebird will behave underload
Irsquom going to start by describinglocking abstractly then Firebirdlocking more specifically then thecontrols you have over the Firebirdlock table and how they can affectyour database performance So ifyou just want to learn about tuningskip to the end of the article
Concurrency control
Concurrency control in databasesystems has three basic functionspreventing concurrent transactionsfrom overwriting each othersrsquochanges preventing readers fromseeing uncommitted changes andgiving a consistent view of the data-
Locking Firebird and the Lock Tablebase to running transactions Thosethree functions can be implementedin various ways The simplest is toserialize transactions ndash allowingeach transaction exclusive access tothe database until it finishes Thatsolution is neither interesting nordesirable
A common solution is to allow eachtransaction to lock the data it useskeeping other transactions fromreading data it changes and fromchanging data it reads Moderndatabases generally lock recordsFirebird provides concurrency con-trol without record locks by keepingmultiple versions of records eachmarked with the identifier of thetransaction that created it Concur-rent transactions donrsquot overwriteeach otherrsquos changes because thesystem will not allow a transactionto change a record if the mostrecent version was created by aconcurrent transaction Readersnever see uncommitted databecause the system will not returnrecord versions created by concur-rent transactions Readers see aconsistent version of the databasebecause the system returns only thedata committed when they start ndashallowing other transactions to cre-ate newer versions Readers donrsquotblock writers
A lock sounds like a very solidobject but in database systems alock anything but solid ldquoLockingrdquo isa shorthand description of a proto-col for reserving resources Data-bases use locks to maintain consis-tency while allowing concurrentindependent transactions to updatedistinct parts of the database Eachtransaction reserves the resources ndashtables records data pages ndash that itneeds to do its work Typically thereservation is made in memory tocontrol a resource on disk so thecost of reserving the resource is notsignificant compared with the costof reading or changing theresource
Locking example
ldquoResourcerdquo is a very abstract termLets start by talking about lockingtables Firebird does lock tables butnormally it locks them only to pre-vent catastrophes like having onetransaction drop a table that anoth-er transaction is using
Before a Firebird transaction canaccess a table it must get a lock onthe table The lock prevents othertransactions from dropping thetable while it is in use When atransaction gets a lock on a tableFirebird makes an entry in its tableof locks indicating the identity of
the transaction that has the lock theidentity of the table being lockedand a function to call if anothertransaction wants an incompatiblelock on the table The normal tablelock is shared ndash other transactionscan lock the same table in the sameway
Before a transaction can delete atable it must get an exclusive lockon the table An exclusive lock isincompatible with other locks If anytransaction has a lock on the tablethe request for an exclusive lock isdenied the drop table statementfails Returning an immediate erroris one way of dealing with conflict-ing lock requests The other is to putthe conflicting request on a list ofunfulfilled requests and let it wait forthe resource to become available
In its simplest form that is how lock-ing works All transactions followthe formal protocol of requesting alock on a resource before using itFirebird maintains a list of lockedresources a list of requests for lockson resources ndash satisfied or waitingndash and a list of the owners of lockrequests When a transactionrequests a lock that is incompatiblewith existing locks on a resource
Author Ann W Harrisonaharrisonibphoenixcom
Requestfor Sponsors
How to Register
ConferenceTimetable
ConferencePapers
and Speaker
Cover story
7
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Locking Firebird and the Lock Table
Firebird either denies the newrequest or puts it on a list to waituntil the resource is available Inter-nal lock requests specify whetherthey wait or receive an immediateerror on a case-by-case basisWhen a transaction starts it speci-fies whether it will wait for locks thatit acquires on tables etc
Lock modes
For concurrency and read commit-ted transactions Firebird lockstables for shared read or sharedwrite Either mode says ldquoIrsquom usingthis table but you are free to use ittoordquo Consistency mode transac-tions follow different rules Theylock tables for protected read orprotected write Those modes sayldquoIrsquom using the table and no one elseis allowed to change it until Irsquomdonerdquo Protected read is compati-ble with shared read and other pro-tected read transactions Protectedwrite is only compatible with shareread
The important concept about lockmodes is that locks are more subtlethan mutexes ndash locks allowresource sharing as well as protect-ing resources from incompatibleuse
Two-phase locking vs tran-sient locking
The table locks that wersquove been
describing follow a protocol knownas two-phase locking which is typi-cal of locks taken by transactions indatabase systems Databases thatuse record locking for consistencycontrol always use two-phaserecord locks In two-phase lockinga transaction acquires locks as itproceeds and holds the locks until itends Once it releases any lock itcan no longer acquire another Thetwo phases are lock acquisition andlock release They cannot overlap
When a Firebird transaction readsa table it holds a lock on that tableuntil it ends When a concurrencytransaction has acquired a sharedwrite lock to update a table no con-sistency mode transaction will beable to get a protected lock on thattable until the transaction with theshared write lock ends and releasesits locks Table locking in Firebird istwo-phase locking
Locks can also be transient takenand released as necessary duringthe running of a transaction Fire-bird uses transient locking exten-sively to manage physical access tothe database
Firebird page locks
One major difference between Fire-bird and most other databases isFirebirdrsquos Classic mode In Classicmode many separate processesshare write access to a single data-
base file Most databases have asingle server process like Super-Server that has exclusive access tothe database and coordinatesphysical access to the file within
itself Firebird coordinates physicalaccess to the database throughlocks on database pages
In general database theory a trans-
Cover story
8
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Locking Firebird and the Lock Table
action is a set of steps that transformthe database from on consistentstate to another During that trans-formation the resources held by thetransaction must be protected fromincompatible changes by othertransactions Two-phase locks arethat protection
In Firebird internally each time atransaction changes a page itchanges that page ndash and the physi-cal structure of the database as awhole ndash from one consistent state toanother Before a transaction readsor writes a database page it locksthe page When it finishes readingor writing it can release the lockwithout compromising the physicalconsistency of the database fileFirebird page level locking is tran-sient Transactions acquire andrelease page locks throughout theirexistence However to preventdeadlocks transactions must beable to release all the page locks itholds before acquiring a lock on anew page
The Firebird lock table
When all access to a database isdone in a single process ndash as is thecase with most database systems ndashlocks are held in the serverrsquos memo-ry and the lock table is largely invis-ible The server process extends orremaps the lock information asrequired Firebird however man-ages its locks in a shared memory
section In SuperServer only theserver uses that share memoryarea In Classic every databaseconnection maps the shared memo-ry and every connection can readand change the contents of thememory
The lock table is a separate piece ofshare memory In SuperServer thelock table is mapped into the serverprocess In Classic each processmaps the lock table All databaseson a server computer share thesame lock table except those run-ning with the embedded server
The Firebird lock manager
We often talk about the FirebirdLock Manager as if it were a sepa-rate process but it isnrsquot The lockmanagement code is part of theengine just like the optimizer pars-er and expression evaluator Thereis a formal interface to the lockmanagement code which is similarto the formal interface to the distrib-uted lock manager that was part ofVAXVMS and one of the inter-faces to the Distributed Lock Man-ager from IBM
The lock manager is code in theengine In Classic each process hasits own lock manager When aClassic process requests or releasesa lock its lock management codeacquires a mutex on the sharedmemory section and changes the
state of the lock table to reflect itsrequest
Conflicting lock requests
When a request is made for a lock
on a resource that is already lockedin an incompatible mode one oftwo things happens Either therequesting transaction gets animmediate error or the request is
Registering for the Conference
Call for papers
Sponsoring the Firebird Conference
httpfirebird-conferencecom
Cover story
9
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Locking Firebird and the Lock Table
put on a list of waiting requests andthe transactions that hold conflictinglocks on the resource are notified ofthe conflicting request Part of everylock request is the address of a rou-tine to call when the lock interfereswith another request for a lock onthe same object Depending on theresource the routine may cause thelock to be released or require thenew request to wait
Transient locks like the locks ondatabase pages are releasedimmediately When a transactionrequests a page lock and that pageis already locked in an incompati-ble mode the transaction or trans-actions that hold the lock are noti-fied and must complete what theyare doing and release their locksimmediately Two-phase locks liketable locks are held until the trans-action that owns the lock completesWhen the conflicting lock isreleased and the new lock is grant-ed then transaction that had beenwaiting can proceed
Locks as interprocess com-munication
Lock management requires a highspeed completely reliable commu-nication mechanism between trans-actions including transactions indifferent processes The actualmechanism varies from platform toplatform but for the database to
work the mechanism must be fastand reliable A fast reliable inter-process communication mechanismcan be ndash and is ndash useful for a num-ber of purposes outside the areathatrsquos normally considered data-base locking
For example Firebird uses the locktable to notify running transactionsof the existence of a new index on atable Thatrsquos important since assoon as an index becomes activeevery transaction must help main-tain it ndash making new entries when itstores or modifies data removingentries when it modifies or deletesdata
When a transaction first referencesa table it gets a lock on the exis-tence of indexes for the tableWhen another transaction wants tocreate a new index on that table itmust get an exclusive lock on theexistence of indexes for the tableIts request conflicts with existinglocks and the owners of those locksare notified of the conflict Whenthose transactions are in a statewhere they can accept a new indexthey release their locks and imme-diately request new shared locks onthe existence of indexes for thetable The transaction that wants tocreate the index gets its exclusivelock creates the index and com-mits releasing its exclusive lock onthe existence of indexing As othertransactions get their new locks
they check the index definitions forthe table find the new index defini-tion and begin maintaining theindex
Firebird locking summary
Although Firebird does not lockrecords it uses locks extensively toisolate the effects of concurrenttransactions Locking and the locktable are more visible in Firebirdthan in other databases because thelock table is a central communica-tion channel between the separateprocesses that access the databasein Classic mode In addition to con-trolling access to database objectslike tables and data pages the Fire-bird lock manager allows differenttransactions and processes to notifyeach other of changes to the state ofthe database new indexes etc
Lock table specifics
The Firebird lock table is an in-mem-ory data area that contains of fourprimary types of blocks The lockheader block describes the locktable as a whole and containspointers to lists of other blocks andfree blocks Owner blocks describethe owners of lock requests ndash gen-erally lock owners are transactionsconnections or the SuperServerRequest blocks describe the rela-tionship between an owner and alockable resource ndash whether therequest is granted or pending the
mode of the request etc Lockblocks describe the resources beinglocked
To request a lock the owner findsthe lock block follows the linked listof requests for that lock and addsits request at the end If other own-ers must be notified of a conflictingrequest they are located throughthe request blocks already in the listEach owner block also has a list ofits own requests The performancecritical part of locking is finding lockblocks For that purpose the locktable includes a hash table foraccess to lock blocks based on thename of the resource being locked
A quick refresher on hashing
A hash table is an array with linkedlists of duplicates and collisionshanging from it The names of lock-able objects are transformed by afunction called the hash functioninto the offset of one of the elementsof the array When two namestransform to the same offset theresult is a collision When two lockshave the same name they areduplicates and always collide
In the Firebird lock table the arrayof the hash table contains theaddress of a hash block Hashblocks contain the original name acollision pointer a duplicate point-er and the address of the lock blockthat corresponds to the name Thecollision pointer contains the
Cover story
10
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Locking Firebird and the Lock Table
address of a hash block whosename hashed to the same valueThe duplicate pointer contains theaddress of a hash block that hasexactly the same name
A hash table is fast when there arerelatively few collisions With nocollisions finding a lock blockinvolved hashing the name index-ing into the array and reading thepointer from the first hash blockEach collision adds another pointerto follow and name to check Theratio of the size of the array to thenumber of locks determines thenumber of collisions Unfortunatelythe width of the array cannot beadjusted dynamically because thesize of the array is part of the hashfunction Changing the widthchanges the result of the functionand invalidates all existing entries inthe hash table
Adjusting the lock table toimprove performance
The size of the hash table is set inthe Firebird configuration file Youmust shut down all activity on alldatabases that share the hash tablendash normally all databases on themachine ndash before changes takeeffect The Classic architecture usesthe lock table more heavily thanSuperServer If you choose theClassic architecture you shouldcheck the load on the hash tableperiodically and increase the num-
ber of hash slots if the load is highThe symptom of an overloadedhash table is sluggish performanceunder load
The tool for checking the lock tableis fb_lock_print which is a com-mand line utility in the bin directoryof the Firebird installation tree Thefull lock print describes the entirestate of the lock table and is of limit-ed interest When your system isunder load and behaving badlyinvoke the utility with no options orswitches directing the output to afile Open the file with an editorYoull see output that starts some-thing like this
LOCK_HEADER BLOCK Version114 Active owner0 Length262144 Used85740 Semmask0x0 Flags 0x0001 Enqs 18512 Converts 490 Rejects0 Blocks 0 Deadlock scans0 Deadlocks0 Scan interval10 Acquires 21048 Acquire blocks0 Spin count0 Mutex wait103 Hash slots101 Hash lengths (minavgmax)3 15 30
hellipThe seventh and eighth lines suggestthat the hash table is too small andthat it is affecting system perform-ance In the example these valuesindicate a problem
Mutex wait 103 Hash slots 101 Hash lengths (minavgmax)3 15 30 In the Classic architecture eachprocess makes its own changes tothe lock table Only one process is
allowed to update the lock table atany instant When updating the locktable a process holds the tablersquosmutex A non-zero mutex wait indi-cates that processes are blocked bythe mutex and forced to wait foraccess to the lock table In turn thatindicates a performance probleminside the lock table typicallybecause looking up a lock is slow
If the hash lengths are more thanmin 5 avg 10 or max 30 youneed to increase the number ofhash slots The hash function used inFirebird is quick but not terribly effi-cient It works best if the number ofhash slots is prime
Change this line in the configurationfileLockHashSlots = 101
Uncomment the line by removing
the leading Choose a value thatis a prime number less than 2048
LockHashSlots = 499
The change will not take effect untilall connections to all databases onthe server machine shut down
If you increase the number of hashslots you should also increase thelock table size The second line ofthe lock print
Version114 Active owner 0Length 262144 Used 85740
tells you how close you are to run-ning out of space in the lock tableThe Version and Active owner areuninteresting The length is the max-imum size of the lock table Used isthe amount of space currently allo-cated for the various block typesand hash table If the amount usedis anywhere near the total lengthuncomment this parameter in theconfiguration file by removing theleading and increase the value
LockMemSize = 262144
to this
LockMemSize = 1048576
The value is bytes The default locktable is about a quarter of amegabyte which is insignificant onmodern computers Changing thelock table size will not take effectuntil all connections to all databaseson the server machine shut down
PHP Server
One of the more interest-ing recent developmentsin information technolo-gy has been the rise ofbrowser based applica-tions often referred toby the acronym LAMP
One key hurdle forbroad use of the LAMPtechnology for mid-mar-ket solutions was that itwas never easy to con-figure and manage
PHPServer changes thatit can be installed withjust four clicks of themouse and support forFirebird is compiled in
PHPServer shares all thequalities of Firebird it isa capable compacteasy to install and easyto manage solution
PHPServer is a freedownload
Read more at
wwwfyracleorgphpserverhtml
News amp Events
Server internals
11
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Inside BLOBs
How the serverworks with BLOBs
The BLOB data type is intended forstoring data of variable size Fieldsof BLOB type allow for storage ofdata that cannot be placed in fieldsof other types - for example pic-tures audio files video fragmentsetc
From the point view of the databaseapplication developer using BLOBfields is as transparent as it is forother field types (see chapter ldquoDatatypesrdquo for details) However thereis a significant difference betweenthe internal implementation mecha-nism for BLOBs and that for otherdata
Unlike the mechanism used for han-dling other types of fields the data-base engine uses a special mecha-nism to work with BLOB fields Thismechanism is transparently inte-grated with other record handlingat the application level and at thesame time has its own means ofpage organization Lets consider indetail how the BLOB-handlingmechanism works
Inside BLOBsThis is an excerpt from the book ldquo1000 InterBase amp Firebird Tips amp Tricksrdquoby Alexey Kovyazin and Dmitri Kouzmenko which will be published in 2006
Initially the basic record data onthe data page includes a referenceto a ldquoBLOB recordrdquo for each non-null BLOB field ie to record-likestructure or quasi-record that actu-
ally contains the BLOB dataDepending on the size of the BLOBthis BLOB-record will be one ofthree types
The first type is the simplest If thesize of BLOB-field data is less thanthe free space on the data page it isplaced on the data page as a sepa-rate record of BLOB type
Author Dmitri Kouzmenkokdvib-aidcom
Author Alexey Kovyazinakib-aidcom
Server internals
12
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Inside BLOBs
The second type is used when thesize of BLOB is greater than thefree space on the page In thiscase references to pages contain-ing the actual BLOB data arestored in a quasi-record Thus atwo-level structure of BLOB-fielddata is used
If the size of BLOB-field contents isvery large a three-level structureis used ndash a quasi-record stores ref-erences to BLOB pointer pageswhich contain references to theactual BLOB data
The whole structure of BLOB stor-age (except for the quasi-recordof course) is implemented by onepage type ndash the BLOB page typeDifferent types of BLOB-pages dif-fer from each other in the presenceof a flag (value 0 or 1) defininghow the server should interpret thegiven page
BLOB Page
The blob page consists of the fol-lowing parts
The special header contains thefollowing information
bullThe number of the first blob pagein this blob It is used to check thatpages belong to one blob
bullA sequence number This isimportant in checking the integrityof a BLOB For a BLOB pointerpage it is equal to zero
bullThe length of data on a page Asa page may or may not be filled tothe full extent the length of actualdata is indicated in the header
Maximum BLOB size
As the internal structure for storingBLOB data can have only 3 levelsof organization and the size ofdata page is also limited it is pos-sible to calculate the maximum sizeof a BLOB
However this is a theoretical limit(if you want you can calculate it)but in practice the limit will bemuch lower The reason for thislower limit is that the length ofBLOB-field data is determined by avariable of ULONG type ie itsmaximal size will be equal to 4gigabytes
Moreover in reality this practicallimit is reduced if a UDF is to beused for BLOB processing Aninternal UDF implementationassumes that the maximum BLOB
size will be 2 gigabytes So if youplan to have very large BLOBfields in your database you shouldexperiment with storing data of alarge size beforehand
The segment size mystery
Developers of database applica-tions often ask what the SegmentSize parameter in the definition ofa BLOB is why we need it andwhether or not we should set itwhen creating Blob-fields
In reality there is no need to setthis parameter Actually it is a bitof a relic used by the GPRE utilitywhen pre-processing EmbeddedSQL When working with BLOBsGPRE declares a buffer of speci-fied size based on the segmentsize Setting the segment size hasno influence over the allocationand the size of segments whenstoring the BLOB on disk It alsohas no influence on performanceTherefore the segment size can besafely set to any value but it is setto 80 bytes by default
Information for those who want toknow everything the number 80was chosen because 80 symbolscould be allocated in alphanumer-ic terminals
TestBed
13
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
In issue 1 we published DmitriYemanovrsquos article about the inter-nals of savepoints While that articlewas still on the desk Borlandannounced the release of InterBase751 introducing amongst otherthings a NO SAVEPOINT optionfor transaction management Is thisan important improvement for Inter-Base We decided to give thisimplementation a close look and testit some to discover what it is allabout
Testing NO SAVEPOINT
In order to analyze the problem thatthe new transaction option wasintended to address and to assessits real value we performed severalvery simple SQL tests The tests areall 100 reproducible so you willbe able to verify our results easily
Database for testing
The test database file was created inInterBase 751 page size = 4096character encoding is NONE Itcontains two tables one stored pro-cedure and three generators
For the test we will use only one
Testing the NO SAVEPOINTfeature in InterBase 751 Author Alexey Kovyazin
akib-aidcom
Author Vlad Horsunhvladuserssourceforgenet
table with the following structure
CREATE TABLE TEST (ID NUMERIC(182)NAME VARCHAR(120)DESCRIPTION VARCHAR(250)CNT INTEGERQRT DOUBLE PRECISIONTS_CHANGE TIMESTAMPTS_CREATE TIMESTAMPNOTES BLOB
)This table contains 100000 records which will be updated during the testThe stored procedure and generators are used to fill the table with test dataYou can increase the quantity of records in the test table by calling thestored procedure to insert them
SELECT FROM INSERTRECS(1000)
The second table TEST2DROP has the same structure as the first and is filledwith the same records as TESTINSERT INTO TEST2DROP SELECT FROM TESTAs you will see the second table will be dropped immediately after connectWe are just using it as a way to increase database size cheaply the pagesoccupied by the TEST2DROP table will be released for reuse after we dropthe table With this trick we avoid the impact of database file growth on thetest results
Setting test environment
All that is needed to perform this test is the trial installation package of Inter-Base 751 the test database and an SQL script
Download the InterBase 751 trialversion from wwwborlandcomThe installation process is obviousand well-described in the InterBasedocumentation
You can download a backup of thetest database ready to use fromhttpwwwibdevelopercomissue2testspbackupzip (~4 Mb) oralternatively an SQL script for cre-ating it fromhttpwwwibdevelopercomissue2testspdatabasescriptzip (~1Kb)
If you download the databasebackup the test tables are alreadypopulated with records and youcan proceed straight to the sectionldquoPreparing to testrdquo below
If you choose instead to use theSQL script you will create data-base yourself Make sure youinsert 100000 records into tableTEST using the INSERTRECS storedprocedure and then copy all ofthem to TEST2DROP three or fourtimes
After that perform a backup of thisdatabase and you will be on thesame position as if you had down-
TestBed
14
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
loaded ldquoready for userdquo backup
Hardware is not a material issue for these tests since we are only comparing performance with and without the NOSAVEPOINT option Our test platform was a modest computer with Pentium-4 2 GHz with 512 RAM and an 80GBSamsung HDD
Preparing to test
A separate copy of the test database is used for each test case in order to eliminate any interference between state-ments We create four fresh copies of the database for this purpose Supposing all files are in a directory calledCTEST simply create the four test databases from your test backup file
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp1ib
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp2ib
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp3ib
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp4ib
SQL test scripts
The first script tests a regular one-pass update without the NO SAVEPOINT option
For convenience the important commands are clarified with comments
connect CTESTtestsp1ib USER SYSDBA Password masterkey Connect drop table TEST2DROP Drop table TEST2DROP to free database pagescommitselect count() from test Walk down all records in TEST to place them into cachecommitset time on enable time statistics for performed statementsset stat on enables writesfetchesmemory statisticscommit perform bulk update of all records in TEST table update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommit
quit The second script tests performance for the same UPDATE with the NO SAVEPOINT option
TestBed
15
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
connect Ctestsp2ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitSET TRANSACTION NO SAVEPOINT enable NO SAVEPOINTupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit
Except for the inclusion of the SET TRANSACTION NO SAVEPOINT statement in the second script both scripts arethe same simply testing the behavior of engine in case of the single bulk UPDATE
To test sequential UPDATEs we added several UPDATE statements--we recommend using five The script for testing without NO SAVEPOINT would be
connect Etestsp3ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit
Fyracle 089
Janus has released anew version of Oracle-mode Firebird FyracleFyracle is a specialized
build of Firebird 15it adds temporary
tables hierarchicalqueries and
a PLSQL engine
Version 089 addssupport for storedprocedures written
in Java
Fyracle dramaticallyreduces the cost of port-ing Oracle-based appli-
cations to Firebird
Common usageincludes the
Compiere open sourceERP package
mid-market deploymentsof Developer
2000 applicationsand demo CDsof applications
without license trouble
Read more at
wwwjanus-soft-warecom
News amp Events
IBAnalyst 19
IBSurgeon has issued newversion of IBAnalyst
Now it can better analyzeInterBase or Firebird data-base statistics with usingmetadata information (in thiscase connection to databaseis required)
IBAnalyst is a tool that assistsa user to analyze in detailFirebird or InterBase data-base statistics and identifypossible problems with data-base performance mainte-nance and how an applica-tion interacts with the data-base
It graphically displays data-base statistics and can thenautomatically make intelli-gent suggestions aboutimproving database per-formance and databasemaintenance
wwwibsurgeoncomnewshtml
News amp Events
You can download all the scripts and the raw results of their execution from this locationhttpwwwibdevelopercomissue2testresultszip
TestBed
16
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
How to perform the test
The easiest way to perform the test is to use isqls INPUT command
Suppose you have the scripts located in ctestscripts
gtisqlUse CONNECT or CREATE DATABASE to specify a databaseSQLgtinput ctestscriptsscript01sql
Test resultsThe single bulk UPDATE
First letrsquos perform the test where the single-pass bulk UPDATE is performed
This is an excerpt from the one-pass script with default transaction settings
SQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10509940Delta memory = 214016Max memory = 10509940Elapsed time= 463 secBuffers = 2048Reads = 2111Writes 375Fetches = 974980
SQLgt commitCurrent memory = 10340752Delta memory = -169188Max memory = 10509940Elapsed time= 003 secBuffers = 2048Reads = 1Writes 942Fetches = 3
Fast Report 319
The new version of thefamous Fast Report isnow out FastReportreg 3is an add-in componentthat gives your applica-tions the ability to gener-ate reports quickly andefficiently FastReportregprovides all the tools youneed to develop reportsAll variants of FastRe-portreg 3 contains
Visual report designerwith rulers guides andzooming wizard forbasic types of reportexport filters for html tiffbmp jpg xls pdf out-puts Dot matrix reportssupport support for mostpopular DB-engines FullWYSIWYG text rotation0360 degrees memoobject supports simplehtml-tags (font color b iu sub sup) improvedstretching (StretchModeShiftMode properties)access to DB fieldsstyles text flow URLsAnchors
TrialBuy Now
News amp Events
TECT Softwarepresents
New versions of nice utilitiesfrom TECT Softwarewwwtectsoftnet
SPGen Stored Procedure Gen-erator for Firebird and Inter-
Base creates a standard set ofstored procs for any table full
details can be found here
httpwwwtectsoftnetProductsFirebird
FIBSPGenaspx
And FBMail Firebird EMail(FBMail) is a cross platform PHP
utility which easily allows thesending of emails direct from
within a database
This utility is specifically aimedat ISPs that support Firebirdbut can easily be used on anycomputer which is connected
to the internet
Full detailscan be found here
httpwwwtectsoftnetProductsFirebird
FBMailaspx
News amp Events
This is an excerpt from the one-pass script with NO SAVEPOINT enabled
CopyTiger 100
CopyTigera new productfrom Microtecis an advanced
databasereplication amp
synchronisationtool for
InterBaseFirebird
powered bytheir CopyCatcomponent set(see the articleabout CopyCat
in this issue)
For moreinformation about
CopyTiger seewwwmicrotecfr
copycatct
Downloadan evaluation
version
TestBed
17
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
SQLgt SET TRANSACTION NO SAVEPOINTSQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10352244Delta memory = 55296Max memory = 10352244Elapsed time= 472 secBuffers = 2048Reads = 2102Writes 350Fetches = 967154
SQLgt commitCurrent memory = 10344052Delta memory = -8192Max memory = 10352244Elapsed time= 015 secBuffers = 2048Reads = 1Writes 938Fetches = 3
Performance appears to be almost the same whether the NO SAVEPOINT option is enabled or not
Sequential bulk UPDATE statements
With the mutlti-pass script (sequential UPDATEs) the raw test results are rather large
Table 1
Test results for 5 sequentalUPDATEs
News amp Events
SQLHammer 15Enterprise Edition
The interestingdeveloper toolSQLHammer
now hasan Enterprise Edition
SQLHammer isa high-performance tool
for rapiddatabase development
and administration
It includes a commonintegrated development
environmentregistered custom
componentsbased on the
Borland Package Librarymechanism
and an Open Tools APIwritten in
DelphiObject Pascal
wwwmetadataforgecom
News amp Events
TestBed
18
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
For convenience the results are tabulated above
The first UPDATE statement has almost the same execution time with and without the NOSAVEPOINT option However memory consumption is reduced fivefold when we use NOSAVEPOINT
Figure 1
Time to perform UPDATEs with and without NO SAVEPOINT
Table 1
Test results for 5 sequental UPDATEs
In the second UPDATE we start to see the difference With default transaction settings thisUPDATE takes a very long time - 47 seconds - compared to only 7210 ms with NO SAVE-POINT enabled With default transaction settings we can see that memory usage is signifi-cant wherease with NO SAVEPOINT no additional memory is used
The third and all following UPDATE statements with default settings show equal time andmemory usage values and the growth of writes parameters
Figure 2
Memory usage while performing UPDATEs with and without NO SAVEPOINT
With NO SAVEPOINT usage we observe that timememory values and writes growth areall small and virtually equal for each pass The corresponding graphs are below
Inside the UNDO logSo what happened during the execution of the test scripts What is the secret behind thismagic that NO SAVEPOINT does Is there any cost attached to these gains
A few words about versions
You probably know already that InterBase is a multi-record-version database enginemeaning that each time a record is changed a new version of that record is produced The
TestBed
19
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
old version does not disappearimmediately but is retained as abackversion
In fact the first time a backversion iswritten to disk it is as a delta ver-sion which saves disk space andmemory usage by writing out onlythe differences between the old andthe new versions The engine canrebuild the full old version from thenew version and chains of delta ver-sions It is only if the same record isupdated more than once within thesame transaction that the full back-version is written to disk
The UNDO log concept
You may recall from Dmitrirsquos articlehow each transaction is implicitlyenclosed in a frame of savepointseach having its own undo log Thislog stores a record of each changein sequence ready for the possibili-ty that a rollback will be requested
A backversion materializes when-ever an UPDATE or DELETE state-ment is performed The engine hasto maintain all these backversions inthe undo log for the relevant save-point
So the Undo Log is a mechanism tomanage backversions for save-points in order to enable the associ-ated changes to be rolled back Theprocess of Undo logging is quitecomplex and maintaining it canconsume a lot of resources
The NO SAVEPOINToption
The NO SAVEPOINT option inInterBase 751 is a workaround forthe problem of performance lossduring bulk updates that do multiplepasses of a table The theory is ifusing the implicit savepoint man-agement causes problems then letskill the savepoints No savepoints ndashno problem -)
Besides ISQL it has been surfacedas a transaction parameter in bothDSQL and ESQL At the API level anew transaction parameter block(TPB) option isc_tpb_no_savepointcan be passed in the isc_start_trans-action() function call to disablesavepoints management Syntaxdetails for the latter flavors and forthe new tpb option can be found inthe 751 release notes
The effect of specifying the NOSAVEPOINT transaction parameteris that no undo log will be createdHowever along with the perform-ance gain for sequential bulkupdates it brings some costs fortransaction management
First and most obvious is that withNO SAVEPOINT enabled anyerror handling that relies on save-points is unavailable Any errorduring a NO SAVEPOINT transac-tion precludes all subsequent exe-cution and leads to rollback (see
ldquoRelease Notesrdquo for InterBase 75 SP1 ldquoNew in InterBase 751rdquo page 2-2)
Secondly when a NO SAVEPOINT transaction is rolled back it is marked as rolled back in the transactioninventory page Record version garbage thereby gets stuck in the interesting category and prevents theOIT from advancing Sweep is needed to advance the OIT and back out dead record versions
Fuller details of the NO SAVEPOINT option are provided in the InterBase 751 Release Notes
Initial situation
Consider the implementation details of the undo-log Figure 3 shows the initial situation
Recall that we perform this test on freshly-restored database so it is guaranteed that only one version existsfor any record
Figure 3
Initial situation before any UPDATE - only the one record version exists Undo log is empty
The first UPDATE
The first UPDATE statement creates delta backversions on disk (see figure 4) Since deltas store only thedifferences between the old and new versions they are quite small This operation is fast and it is easywork for the memory manager
It is simple to visualize the undo log when we perform the first UPDATEDELETE statement inside the trans-action ndash the engine just records the numbers of all affected records into the bitmap structure If it needs toroll back the changes associated with this savepoint it can read the stored numbers of the affected recordsthen walk down to the version of each and restore it from the updated version and the backversion storedon disk
TestBed
20
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
Figure 4
UPDATE1 create small delta version on disk and put record number into UNDO log
This approach is very fast and economical on memory usage The enginedoes not waste too many resources to handle this undo log ndash in fact it reusesthe existing multi-versioning mechanism Resource consumption is merelythe memory used to store the bitmap structure with the backversion num-bers We dont see any significant difference here between a transactionwith the default settings and one with the NO SAVEPOINT option enabled
The second UPDATE
When the second UPDATE statement is performed on the same set of records we have adifferent situation
Here is a good place to note that the example we are considering is the most simple situa-tion where only the one global (transaction level) savepoint exists We will also look at thedifference in the Undo log when an explicit (or enclosed BEGINhellip END) savepoint is used
To preserve on-disk integrity (remember the lsquocareful writersquo principle ) the engine mustcompute a new delta between the old version (by transaction1) and new version (by trans-action2 update2) store it somewhere on disk fetch the current full version (by transac-tion2 update1) put it into the in-memory undo-log replace it with the new full version (withbackpointers set to the newly created delta) and erase the old now superseded delta Asyou can see ndash there is much more work to do both on disk and in memory
The engine could write all intermediate versions to disk but there is no reason to do soThese versions are visible only to the modifying transaction and would not be used unlessa rollback was required
Figure 5
The second UPDATE creates a new delta backversion for transaction 1erases from disk the delta version created by the first UPDATE and copiesthe version from UPDATE1 into the Undo log
This all makes hard work for the memory manager and the CPU as you can see from thegrowth of the ldquomax memrdquordquodelta memrdquo parameters values in the test that uses the defaulttransaction parameters
When NO SAVEPOINT is enabled we avoid the expense of maintaining the Undo log Asa result we see execution time readswrites and memory usage as low for subsequentupdates as for the first
The third UPDATE
The third and all subsequent UPDATEs are similar to the second UPDATE with one excep-tion ndash memory usage does not grow any further
TestBed
21
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
Figure 6
The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2version and its own version is written to disk as the latest one
Figure 7
If we have an explicit SAVEPOINT each new record version associated with it will have acorresponding backversion in the Undo log of that savepoint
Original design of IB implement second UPDATE another way but sometime after IB6 Bor-land changed original behavior and we see what we see nowBut this theme is for anotherarticle)
Why is the delta of memory usage zero The reason is that beyond the second UPDATEno record version is created From here on the update just replaces record data on diskwith the newest one and shifts the superseded version into the Undo log
A more interesting question is why we see an increase in disk reads and writes during thetest We would have expected that the third and following UPDATEs would do essentiallyequal numbers of read and writes to write the newest versions and move the previous onesto the undo log However we are actually seeing a growing count of writes We have noanswer for it but we would be pleased to know
The following figure (figure 6) helps to illustrate the situation in the Undo log during thesequential updates When NO SAVEPOINT is enabled the only pieces we need to performare replacing the version on disk and updating the original backversion It is fast as the firstUPDATE
Explicit SAVEPOINTWhen an UPDATE statement is going to be performed within its own explicit or implicit
BEGIN END savepoint framework the engine has to store a backversion for each associ-ated record version in the Undo log
For example if we used an explicit savepoint eg SAVEPOINT Savepoint1 upon perform-ing UPDATE2 we would have the situation illustrated in figure 7
In this case the memory consumption would be expected to increase each time an UPDATEoccurs within the explicit savepoints scope
SummaryThe new transaction option NO SAVEPOINT can solve the problem of excessive resourceusage growth that can occur with sequential bulk updates It should be beneficial whenapplied appropriately Because the option can create its own problems by inhibiting theadvance of the OIT it should be used with caution of course The developer will need to takeextra care about database housekeeping particularly with respect to timely sweeping
Development area
22
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article
The sources in the bibliography arelisted in the order of their appear-ance in my mind
The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability
of all the methods taken from thearticles and of course I have testedall my own methods
Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise
I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc
And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle
The problem state-ment
What is the problem
Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common
1Everyone got used to them andfelt comfortable with them
2They are quite fast (if you usethem according to certain knownstandards)
3They use SQL which is an easycomprehensible and time-proveddata manipulation method
4They are based upon a strongmathematical theory
5They are convenient for applica-
tion development - if you developyour applications just like 20-30years ago
As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and
Author Vladimir Kotlyarevskyvladcontekru
Development area
23
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip
As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored
What do we need
The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing
RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task
The OID
All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it
First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have
any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database
OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex
Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)
Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world
And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones
Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles
The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers
ClassId
All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is
Development area
24
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain
Name Description cre-ation_date change_dateowner
Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect
links is often a very complicatedtask to put it mildly
Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later
The OBJECTS Table
Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure
So let us design this table
Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)
The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)
Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method
Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage
There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student
create domain TOID as integer not nullcreate table OBJECTS(
OID TOID primary keyClassId TOIDName varchar(32)
A simple plain dictionary can be retrieved from the OBJECTStable by the following query
select OID Name Description from OBJECTS where ClassId = LookupId
Development area
25
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description
from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)
Later I will demonstrate how to add alittle more intelligence to such simpledictionaries
Storing of more complexobjects
It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping
Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders
create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))
which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to
ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query
select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id
As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did
If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M
create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))
One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present
Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database
Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this
create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))
connected 1M with OBJECTS by OID There is also a table
create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))
which describes type metadata ndash an attribute set (their names and types) for eachobject type
All attributes for particular object where the OID is known are retrieved by the query
select attribute_id value from attributeswhere OID = oid
or with names of attributes
select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id
Development area
26
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries
select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id
Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query
Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the
database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base
The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1
Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage
Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB
It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-
utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation
To be continuedhellip
FastReport 3 - new generation of the reporting tools
Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix
reports support support most popular DB-engines
Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags
(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)
Access to DB fields styles text flow URLs Anchors
httpwwwfast-reportcom
Development area
27
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
Replicating and synchronizingInterBaseFirebirddatabases using CopyCat
AuthorJonathan Neve
jonathanmicrotecfr
wwwmicrotecfr
PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication
Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases
Data logging
Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)
Multi-node replication
Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after
In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it
replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)
When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it
Two-way replication
One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed
The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change
Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself
Primary key synchronization
One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-
Development area
28
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values
Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server
When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique
Conflict management
Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-
sion of the record this is a conflict
CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved
This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently
Difficult database structures
There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node
Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged
To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an
Development area
29
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable
PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms
1 As a set of Delphi C++ Builder components
2 As a standalone Win32 replication tool
We will now present each one of these products
1 CopyCat Delphi C++ Builder Component Set
CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions
Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server
Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table
Below is a concise guide for getting started with the CopyCat component suite
Download and install the evaluation components fromhttpwwwmicrotecfrcopycat
Prepare databases
1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components
2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)
3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo
4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base
5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo
Development area
30
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data
6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority
7) Apply all the generated SQL to all databases that should be replicated
8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)
Replicate
1 In Delphi open the ldquoReplicatorrdquo example project
2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property
3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases
4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)
5 Compile and run the example
6 Press the ldquoReplicate nowrdquo button
2 CopyTiger the CopyCat Win32standalone replication tool
For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER
Features include
bull Easy to use installer
bull Independent server Administrator tool (CTAdmin)
bull Configuration wizard for setting up links to master slave databases
bull Robust replication engine based on Microtec CopyCat
bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections
bull Simple amp intuitive control panel
bullAutomatic email notification on certain events (conflicts PK violations etc)
Visit the CopyTiger homepage to download a time-limited trial version
httpwwwmicrotecfrcopycatct
SUMMARY
In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit
1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications
2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure
CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool
You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr
As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold
Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code
ibd-852-150
Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems
Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases
Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the
Development area
31
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance
Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets
Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained
database statistics from time to time
Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and
indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports
The summary shown in Figure 1provides general information about
Figure 1 Summary of database statistics
Author Dmitri Kouzmenkokdvib-aidcom
1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction
Development area
32
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases
Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)
As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed
What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay
Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be
stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF
Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure
Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1
Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint
We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base
bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment
bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions
bull The Oldest active3 ndash the oldestcurrently active transaction
bull The Next transaction ndash thetransaction number that will beassigned to new transaction
bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell
if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources
bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset
As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem
It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless
Do not gather statistics if you
bull Just restored your database
bull Performed backup (gbak ndashb dbgdb) without the ndashg switch
bull Recently performed a manual sweep (gfix ndashsweep)
Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)
Development area
33
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw
The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data
For example look at ADDR_ADDRESS_IDX6 First of all the index
How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)
The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most
multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments
Table informationLets take look at another sampleoutput from IBAnalyst
The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics
ly and what the table size is inmegabytes Most of these warningsare customizable
In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records
Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics
were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could
bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway
Development area
34
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20
bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column
That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics
Reports
There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded
As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size
Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-
Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144
ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3
SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance
Readers feedback
35
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Comments to ldquoTemporary tablesrdquo
One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence
Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird
This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience
Then there are situations where theoptimizer simply loses the plot
because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans
Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem
The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp
and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample
The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)
We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is
dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably
Well to be fair Firebird developersare working on both topics
Volker Rehn
volkerrehnbigpondcom
We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it
Alexey Kovyazin Chief Editor
Readers feedback
Miscellaneous
36
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Invitations SubscriptionDonations Sponsorships
This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000
Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance
Subscribe now
To receive future issuesnotifications send email to
subscribeibdevelopercom
MagazineCD
Donations
Cover story
6
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Locking Firebird and the Lock Table
Firebird WorldwideConference 2005
The Firebird conferencewill take place
at the Hotel Olsankain Prague
Czech Republicfrom the evening
of Sunday the 13thof November
(opening session)until the evening
of Tuesday the 15thof November
(closing session)
News amp Events
One of the best-known facts aboutFirebird is that it uses multi-versionconcurrency control instead ofrecord locking For databases thatmanage concurrency thoughrecord locks understanding mecha-nisms of record locking is critical todesigning a high performance highconcurrency application In gener-al record locking is irrelevant toFirebird However Firebird useslocks to maintain internal consisten-cy and for interprocess coordina-tion Understanding how Firebirddoes (and does not) use locks helpswith system tuning and anticipatinghow Firebird will behave underload
Irsquom going to start by describinglocking abstractly then Firebirdlocking more specifically then thecontrols you have over the Firebirdlock table and how they can affectyour database performance So ifyou just want to learn about tuningskip to the end of the article
Concurrency control
Concurrency control in databasesystems has three basic functionspreventing concurrent transactionsfrom overwriting each othersrsquochanges preventing readers fromseeing uncommitted changes andgiving a consistent view of the data-
Locking Firebird and the Lock Tablebase to running transactions Thosethree functions can be implementedin various ways The simplest is toserialize transactions ndash allowingeach transaction exclusive access tothe database until it finishes Thatsolution is neither interesting nordesirable
A common solution is to allow eachtransaction to lock the data it useskeeping other transactions fromreading data it changes and fromchanging data it reads Moderndatabases generally lock recordsFirebird provides concurrency con-trol without record locks by keepingmultiple versions of records eachmarked with the identifier of thetransaction that created it Concur-rent transactions donrsquot overwriteeach otherrsquos changes because thesystem will not allow a transactionto change a record if the mostrecent version was created by aconcurrent transaction Readersnever see uncommitted databecause the system will not returnrecord versions created by concur-rent transactions Readers see aconsistent version of the databasebecause the system returns only thedata committed when they start ndashallowing other transactions to cre-ate newer versions Readers donrsquotblock writers
A lock sounds like a very solidobject but in database systems alock anything but solid ldquoLockingrdquo isa shorthand description of a proto-col for reserving resources Data-bases use locks to maintain consis-tency while allowing concurrentindependent transactions to updatedistinct parts of the database Eachtransaction reserves the resources ndashtables records data pages ndash that itneeds to do its work Typically thereservation is made in memory tocontrol a resource on disk so thecost of reserving the resource is notsignificant compared with the costof reading or changing theresource
Locking example
ldquoResourcerdquo is a very abstract termLets start by talking about lockingtables Firebird does lock tables butnormally it locks them only to pre-vent catastrophes like having onetransaction drop a table that anoth-er transaction is using
Before a Firebird transaction canaccess a table it must get a lock onthe table The lock prevents othertransactions from dropping thetable while it is in use When atransaction gets a lock on a tableFirebird makes an entry in its tableof locks indicating the identity of
the transaction that has the lock theidentity of the table being lockedand a function to call if anothertransaction wants an incompatiblelock on the table The normal tablelock is shared ndash other transactionscan lock the same table in the sameway
Before a transaction can delete atable it must get an exclusive lockon the table An exclusive lock isincompatible with other locks If anytransaction has a lock on the tablethe request for an exclusive lock isdenied the drop table statementfails Returning an immediate erroris one way of dealing with conflict-ing lock requests The other is to putthe conflicting request on a list ofunfulfilled requests and let it wait forthe resource to become available
In its simplest form that is how lock-ing works All transactions followthe formal protocol of requesting alock on a resource before using itFirebird maintains a list of lockedresources a list of requests for lockson resources ndash satisfied or waitingndash and a list of the owners of lockrequests When a transactionrequests a lock that is incompatiblewith existing locks on a resource
Author Ann W Harrisonaharrisonibphoenixcom
Requestfor Sponsors
How to Register
ConferenceTimetable
ConferencePapers
and Speaker
Cover story
7
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Locking Firebird and the Lock Table
Firebird either denies the newrequest or puts it on a list to waituntil the resource is available Inter-nal lock requests specify whetherthey wait or receive an immediateerror on a case-by-case basisWhen a transaction starts it speci-fies whether it will wait for locks thatit acquires on tables etc
Lock modes
For concurrency and read commit-ted transactions Firebird lockstables for shared read or sharedwrite Either mode says ldquoIrsquom usingthis table but you are free to use ittoordquo Consistency mode transac-tions follow different rules Theylock tables for protected read orprotected write Those modes sayldquoIrsquom using the table and no one elseis allowed to change it until Irsquomdonerdquo Protected read is compati-ble with shared read and other pro-tected read transactions Protectedwrite is only compatible with shareread
The important concept about lockmodes is that locks are more subtlethan mutexes ndash locks allowresource sharing as well as protect-ing resources from incompatibleuse
Two-phase locking vs tran-sient locking
The table locks that wersquove been
describing follow a protocol knownas two-phase locking which is typi-cal of locks taken by transactions indatabase systems Databases thatuse record locking for consistencycontrol always use two-phaserecord locks In two-phase lockinga transaction acquires locks as itproceeds and holds the locks until itends Once it releases any lock itcan no longer acquire another Thetwo phases are lock acquisition andlock release They cannot overlap
When a Firebird transaction readsa table it holds a lock on that tableuntil it ends When a concurrencytransaction has acquired a sharedwrite lock to update a table no con-sistency mode transaction will beable to get a protected lock on thattable until the transaction with theshared write lock ends and releasesits locks Table locking in Firebird istwo-phase locking
Locks can also be transient takenand released as necessary duringthe running of a transaction Fire-bird uses transient locking exten-sively to manage physical access tothe database
Firebird page locks
One major difference between Fire-bird and most other databases isFirebirdrsquos Classic mode In Classicmode many separate processesshare write access to a single data-
base file Most databases have asingle server process like Super-Server that has exclusive access tothe database and coordinatesphysical access to the file within
itself Firebird coordinates physicalaccess to the database throughlocks on database pages
In general database theory a trans-
Cover story
8
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Locking Firebird and the Lock Table
action is a set of steps that transformthe database from on consistentstate to another During that trans-formation the resources held by thetransaction must be protected fromincompatible changes by othertransactions Two-phase locks arethat protection
In Firebird internally each time atransaction changes a page itchanges that page ndash and the physi-cal structure of the database as awhole ndash from one consistent state toanother Before a transaction readsor writes a database page it locksthe page When it finishes readingor writing it can release the lockwithout compromising the physicalconsistency of the database fileFirebird page level locking is tran-sient Transactions acquire andrelease page locks throughout theirexistence However to preventdeadlocks transactions must beable to release all the page locks itholds before acquiring a lock on anew page
The Firebird lock table
When all access to a database isdone in a single process ndash as is thecase with most database systems ndashlocks are held in the serverrsquos memo-ry and the lock table is largely invis-ible The server process extends orremaps the lock information asrequired Firebird however man-ages its locks in a shared memory
section In SuperServer only theserver uses that share memoryarea In Classic every databaseconnection maps the shared memo-ry and every connection can readand change the contents of thememory
The lock table is a separate piece ofshare memory In SuperServer thelock table is mapped into the serverprocess In Classic each processmaps the lock table All databaseson a server computer share thesame lock table except those run-ning with the embedded server
The Firebird lock manager
We often talk about the FirebirdLock Manager as if it were a sepa-rate process but it isnrsquot The lockmanagement code is part of theengine just like the optimizer pars-er and expression evaluator Thereis a formal interface to the lockmanagement code which is similarto the formal interface to the distrib-uted lock manager that was part ofVAXVMS and one of the inter-faces to the Distributed Lock Man-ager from IBM
The lock manager is code in theengine In Classic each process hasits own lock manager When aClassic process requests or releasesa lock its lock management codeacquires a mutex on the sharedmemory section and changes the
state of the lock table to reflect itsrequest
Conflicting lock requests
When a request is made for a lock
on a resource that is already lockedin an incompatible mode one oftwo things happens Either therequesting transaction gets animmediate error or the request is
Registering for the Conference
Call for papers
Sponsoring the Firebird Conference
httpfirebird-conferencecom
Cover story
9
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Locking Firebird and the Lock Table
put on a list of waiting requests andthe transactions that hold conflictinglocks on the resource are notified ofthe conflicting request Part of everylock request is the address of a rou-tine to call when the lock interfereswith another request for a lock onthe same object Depending on theresource the routine may cause thelock to be released or require thenew request to wait
Transient locks like the locks ondatabase pages are releasedimmediately When a transactionrequests a page lock and that pageis already locked in an incompati-ble mode the transaction or trans-actions that hold the lock are noti-fied and must complete what theyare doing and release their locksimmediately Two-phase locks liketable locks are held until the trans-action that owns the lock completesWhen the conflicting lock isreleased and the new lock is grant-ed then transaction that had beenwaiting can proceed
Locks as interprocess com-munication
Lock management requires a highspeed completely reliable commu-nication mechanism between trans-actions including transactions indifferent processes The actualmechanism varies from platform toplatform but for the database to
work the mechanism must be fastand reliable A fast reliable inter-process communication mechanismcan be ndash and is ndash useful for a num-ber of purposes outside the areathatrsquos normally considered data-base locking
For example Firebird uses the locktable to notify running transactionsof the existence of a new index on atable Thatrsquos important since assoon as an index becomes activeevery transaction must help main-tain it ndash making new entries when itstores or modifies data removingentries when it modifies or deletesdata
When a transaction first referencesa table it gets a lock on the exis-tence of indexes for the tableWhen another transaction wants tocreate a new index on that table itmust get an exclusive lock on theexistence of indexes for the tableIts request conflicts with existinglocks and the owners of those locksare notified of the conflict Whenthose transactions are in a statewhere they can accept a new indexthey release their locks and imme-diately request new shared locks onthe existence of indexes for thetable The transaction that wants tocreate the index gets its exclusivelock creates the index and com-mits releasing its exclusive lock onthe existence of indexing As othertransactions get their new locks
they check the index definitions forthe table find the new index defini-tion and begin maintaining theindex
Firebird locking summary
Although Firebird does not lockrecords it uses locks extensively toisolate the effects of concurrenttransactions Locking and the locktable are more visible in Firebirdthan in other databases because thelock table is a central communica-tion channel between the separateprocesses that access the databasein Classic mode In addition to con-trolling access to database objectslike tables and data pages the Fire-bird lock manager allows differenttransactions and processes to notifyeach other of changes to the state ofthe database new indexes etc
Lock table specifics
The Firebird lock table is an in-mem-ory data area that contains of fourprimary types of blocks The lockheader block describes the locktable as a whole and containspointers to lists of other blocks andfree blocks Owner blocks describethe owners of lock requests ndash gen-erally lock owners are transactionsconnections or the SuperServerRequest blocks describe the rela-tionship between an owner and alockable resource ndash whether therequest is granted or pending the
mode of the request etc Lockblocks describe the resources beinglocked
To request a lock the owner findsthe lock block follows the linked listof requests for that lock and addsits request at the end If other own-ers must be notified of a conflictingrequest they are located throughthe request blocks already in the listEach owner block also has a list ofits own requests The performancecritical part of locking is finding lockblocks For that purpose the locktable includes a hash table foraccess to lock blocks based on thename of the resource being locked
A quick refresher on hashing
A hash table is an array with linkedlists of duplicates and collisionshanging from it The names of lock-able objects are transformed by afunction called the hash functioninto the offset of one of the elementsof the array When two namestransform to the same offset theresult is a collision When two lockshave the same name they areduplicates and always collide
In the Firebird lock table the arrayof the hash table contains theaddress of a hash block Hashblocks contain the original name acollision pointer a duplicate point-er and the address of the lock blockthat corresponds to the name Thecollision pointer contains the
Cover story
10
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Locking Firebird and the Lock Table
address of a hash block whosename hashed to the same valueThe duplicate pointer contains theaddress of a hash block that hasexactly the same name
A hash table is fast when there arerelatively few collisions With nocollisions finding a lock blockinvolved hashing the name index-ing into the array and reading thepointer from the first hash blockEach collision adds another pointerto follow and name to check Theratio of the size of the array to thenumber of locks determines thenumber of collisions Unfortunatelythe width of the array cannot beadjusted dynamically because thesize of the array is part of the hashfunction Changing the widthchanges the result of the functionand invalidates all existing entries inthe hash table
Adjusting the lock table toimprove performance
The size of the hash table is set inthe Firebird configuration file Youmust shut down all activity on alldatabases that share the hash tablendash normally all databases on themachine ndash before changes takeeffect The Classic architecture usesthe lock table more heavily thanSuperServer If you choose theClassic architecture you shouldcheck the load on the hash tableperiodically and increase the num-
ber of hash slots if the load is highThe symptom of an overloadedhash table is sluggish performanceunder load
The tool for checking the lock tableis fb_lock_print which is a com-mand line utility in the bin directoryof the Firebird installation tree Thefull lock print describes the entirestate of the lock table and is of limit-ed interest When your system isunder load and behaving badlyinvoke the utility with no options orswitches directing the output to afile Open the file with an editorYoull see output that starts some-thing like this
LOCK_HEADER BLOCK Version114 Active owner0 Length262144 Used85740 Semmask0x0 Flags 0x0001 Enqs 18512 Converts 490 Rejects0 Blocks 0 Deadlock scans0 Deadlocks0 Scan interval10 Acquires 21048 Acquire blocks0 Spin count0 Mutex wait103 Hash slots101 Hash lengths (minavgmax)3 15 30
hellipThe seventh and eighth lines suggestthat the hash table is too small andthat it is affecting system perform-ance In the example these valuesindicate a problem
Mutex wait 103 Hash slots 101 Hash lengths (minavgmax)3 15 30 In the Classic architecture eachprocess makes its own changes tothe lock table Only one process is
allowed to update the lock table atany instant When updating the locktable a process holds the tablersquosmutex A non-zero mutex wait indi-cates that processes are blocked bythe mutex and forced to wait foraccess to the lock table In turn thatindicates a performance probleminside the lock table typicallybecause looking up a lock is slow
If the hash lengths are more thanmin 5 avg 10 or max 30 youneed to increase the number ofhash slots The hash function used inFirebird is quick but not terribly effi-cient It works best if the number ofhash slots is prime
Change this line in the configurationfileLockHashSlots = 101
Uncomment the line by removing
the leading Choose a value thatis a prime number less than 2048
LockHashSlots = 499
The change will not take effect untilall connections to all databases onthe server machine shut down
If you increase the number of hashslots you should also increase thelock table size The second line ofthe lock print
Version114 Active owner 0Length 262144 Used 85740
tells you how close you are to run-ning out of space in the lock tableThe Version and Active owner areuninteresting The length is the max-imum size of the lock table Used isthe amount of space currently allo-cated for the various block typesand hash table If the amount usedis anywhere near the total lengthuncomment this parameter in theconfiguration file by removing theleading and increase the value
LockMemSize = 262144
to this
LockMemSize = 1048576
The value is bytes The default locktable is about a quarter of amegabyte which is insignificant onmodern computers Changing thelock table size will not take effectuntil all connections to all databaseson the server machine shut down
PHP Server
One of the more interest-ing recent developmentsin information technolo-gy has been the rise ofbrowser based applica-tions often referred toby the acronym LAMP
One key hurdle forbroad use of the LAMPtechnology for mid-mar-ket solutions was that itwas never easy to con-figure and manage
PHPServer changes thatit can be installed withjust four clicks of themouse and support forFirebird is compiled in
PHPServer shares all thequalities of Firebird it isa capable compacteasy to install and easyto manage solution
PHPServer is a freedownload
Read more at
wwwfyracleorgphpserverhtml
News amp Events
Server internals
11
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Inside BLOBs
How the serverworks with BLOBs
The BLOB data type is intended forstoring data of variable size Fieldsof BLOB type allow for storage ofdata that cannot be placed in fieldsof other types - for example pic-tures audio files video fragmentsetc
From the point view of the databaseapplication developer using BLOBfields is as transparent as it is forother field types (see chapter ldquoDatatypesrdquo for details) However thereis a significant difference betweenthe internal implementation mecha-nism for BLOBs and that for otherdata
Unlike the mechanism used for han-dling other types of fields the data-base engine uses a special mecha-nism to work with BLOB fields Thismechanism is transparently inte-grated with other record handlingat the application level and at thesame time has its own means ofpage organization Lets consider indetail how the BLOB-handlingmechanism works
Inside BLOBsThis is an excerpt from the book ldquo1000 InterBase amp Firebird Tips amp Tricksrdquoby Alexey Kovyazin and Dmitri Kouzmenko which will be published in 2006
Initially the basic record data onthe data page includes a referenceto a ldquoBLOB recordrdquo for each non-null BLOB field ie to record-likestructure or quasi-record that actu-
ally contains the BLOB dataDepending on the size of the BLOBthis BLOB-record will be one ofthree types
The first type is the simplest If thesize of BLOB-field data is less thanthe free space on the data page it isplaced on the data page as a sepa-rate record of BLOB type
Author Dmitri Kouzmenkokdvib-aidcom
Author Alexey Kovyazinakib-aidcom
Server internals
12
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Inside BLOBs
The second type is used when thesize of BLOB is greater than thefree space on the page In thiscase references to pages contain-ing the actual BLOB data arestored in a quasi-record Thus atwo-level structure of BLOB-fielddata is used
If the size of BLOB-field contents isvery large a three-level structureis used ndash a quasi-record stores ref-erences to BLOB pointer pageswhich contain references to theactual BLOB data
The whole structure of BLOB stor-age (except for the quasi-recordof course) is implemented by onepage type ndash the BLOB page typeDifferent types of BLOB-pages dif-fer from each other in the presenceof a flag (value 0 or 1) defininghow the server should interpret thegiven page
BLOB Page
The blob page consists of the fol-lowing parts
The special header contains thefollowing information
bullThe number of the first blob pagein this blob It is used to check thatpages belong to one blob
bullA sequence number This isimportant in checking the integrityof a BLOB For a BLOB pointerpage it is equal to zero
bullThe length of data on a page Asa page may or may not be filled tothe full extent the length of actualdata is indicated in the header
Maximum BLOB size
As the internal structure for storingBLOB data can have only 3 levelsof organization and the size ofdata page is also limited it is pos-sible to calculate the maximum sizeof a BLOB
However this is a theoretical limit(if you want you can calculate it)but in practice the limit will bemuch lower The reason for thislower limit is that the length ofBLOB-field data is determined by avariable of ULONG type ie itsmaximal size will be equal to 4gigabytes
Moreover in reality this practicallimit is reduced if a UDF is to beused for BLOB processing Aninternal UDF implementationassumes that the maximum BLOB
size will be 2 gigabytes So if youplan to have very large BLOBfields in your database you shouldexperiment with storing data of alarge size beforehand
The segment size mystery
Developers of database applica-tions often ask what the SegmentSize parameter in the definition ofa BLOB is why we need it andwhether or not we should set itwhen creating Blob-fields
In reality there is no need to setthis parameter Actually it is a bitof a relic used by the GPRE utilitywhen pre-processing EmbeddedSQL When working with BLOBsGPRE declares a buffer of speci-fied size based on the segmentsize Setting the segment size hasno influence over the allocationand the size of segments whenstoring the BLOB on disk It alsohas no influence on performanceTherefore the segment size can besafely set to any value but it is setto 80 bytes by default
Information for those who want toknow everything the number 80was chosen because 80 symbolscould be allocated in alphanumer-ic terminals
TestBed
13
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
In issue 1 we published DmitriYemanovrsquos article about the inter-nals of savepoints While that articlewas still on the desk Borlandannounced the release of InterBase751 introducing amongst otherthings a NO SAVEPOINT optionfor transaction management Is thisan important improvement for Inter-Base We decided to give thisimplementation a close look and testit some to discover what it is allabout
Testing NO SAVEPOINT
In order to analyze the problem thatthe new transaction option wasintended to address and to assessits real value we performed severalvery simple SQL tests The tests areall 100 reproducible so you willbe able to verify our results easily
Database for testing
The test database file was created inInterBase 751 page size = 4096character encoding is NONE Itcontains two tables one stored pro-cedure and three generators
For the test we will use only one
Testing the NO SAVEPOINTfeature in InterBase 751 Author Alexey Kovyazin
akib-aidcom
Author Vlad Horsunhvladuserssourceforgenet
table with the following structure
CREATE TABLE TEST (ID NUMERIC(182)NAME VARCHAR(120)DESCRIPTION VARCHAR(250)CNT INTEGERQRT DOUBLE PRECISIONTS_CHANGE TIMESTAMPTS_CREATE TIMESTAMPNOTES BLOB
)This table contains 100000 records which will be updated during the testThe stored procedure and generators are used to fill the table with test dataYou can increase the quantity of records in the test table by calling thestored procedure to insert them
SELECT FROM INSERTRECS(1000)
The second table TEST2DROP has the same structure as the first and is filledwith the same records as TESTINSERT INTO TEST2DROP SELECT FROM TESTAs you will see the second table will be dropped immediately after connectWe are just using it as a way to increase database size cheaply the pagesoccupied by the TEST2DROP table will be released for reuse after we dropthe table With this trick we avoid the impact of database file growth on thetest results
Setting test environment
All that is needed to perform this test is the trial installation package of Inter-Base 751 the test database and an SQL script
Download the InterBase 751 trialversion from wwwborlandcomThe installation process is obviousand well-described in the InterBasedocumentation
You can download a backup of thetest database ready to use fromhttpwwwibdevelopercomissue2testspbackupzip (~4 Mb) oralternatively an SQL script for cre-ating it fromhttpwwwibdevelopercomissue2testspdatabasescriptzip (~1Kb)
If you download the databasebackup the test tables are alreadypopulated with records and youcan proceed straight to the sectionldquoPreparing to testrdquo below
If you choose instead to use theSQL script you will create data-base yourself Make sure youinsert 100000 records into tableTEST using the INSERTRECS storedprocedure and then copy all ofthem to TEST2DROP three or fourtimes
After that perform a backup of thisdatabase and you will be on thesame position as if you had down-
TestBed
14
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
loaded ldquoready for userdquo backup
Hardware is not a material issue for these tests since we are only comparing performance with and without the NOSAVEPOINT option Our test platform was a modest computer with Pentium-4 2 GHz with 512 RAM and an 80GBSamsung HDD
Preparing to test
A separate copy of the test database is used for each test case in order to eliminate any interference between state-ments We create four fresh copies of the database for this purpose Supposing all files are in a directory calledCTEST simply create the four test databases from your test backup file
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp1ib
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp2ib
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp3ib
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp4ib
SQL test scripts
The first script tests a regular one-pass update without the NO SAVEPOINT option
For convenience the important commands are clarified with comments
connect CTESTtestsp1ib USER SYSDBA Password masterkey Connect drop table TEST2DROP Drop table TEST2DROP to free database pagescommitselect count() from test Walk down all records in TEST to place them into cachecommitset time on enable time statistics for performed statementsset stat on enables writesfetchesmemory statisticscommit perform bulk update of all records in TEST table update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommit
quit The second script tests performance for the same UPDATE with the NO SAVEPOINT option
TestBed
15
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
connect Ctestsp2ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitSET TRANSACTION NO SAVEPOINT enable NO SAVEPOINTupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit
Except for the inclusion of the SET TRANSACTION NO SAVEPOINT statement in the second script both scripts arethe same simply testing the behavior of engine in case of the single bulk UPDATE
To test sequential UPDATEs we added several UPDATE statements--we recommend using five The script for testing without NO SAVEPOINT would be
connect Etestsp3ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit
Fyracle 089
Janus has released anew version of Oracle-mode Firebird FyracleFyracle is a specialized
build of Firebird 15it adds temporary
tables hierarchicalqueries and
a PLSQL engine
Version 089 addssupport for storedprocedures written
in Java
Fyracle dramaticallyreduces the cost of port-ing Oracle-based appli-
cations to Firebird
Common usageincludes the
Compiere open sourceERP package
mid-market deploymentsof Developer
2000 applicationsand demo CDsof applications
without license trouble
Read more at
wwwjanus-soft-warecom
News amp Events
IBAnalyst 19
IBSurgeon has issued newversion of IBAnalyst
Now it can better analyzeInterBase or Firebird data-base statistics with usingmetadata information (in thiscase connection to databaseis required)
IBAnalyst is a tool that assistsa user to analyze in detailFirebird or InterBase data-base statistics and identifypossible problems with data-base performance mainte-nance and how an applica-tion interacts with the data-base
It graphically displays data-base statistics and can thenautomatically make intelli-gent suggestions aboutimproving database per-formance and databasemaintenance
wwwibsurgeoncomnewshtml
News amp Events
You can download all the scripts and the raw results of their execution from this locationhttpwwwibdevelopercomissue2testresultszip
TestBed
16
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
How to perform the test
The easiest way to perform the test is to use isqls INPUT command
Suppose you have the scripts located in ctestscripts
gtisqlUse CONNECT or CREATE DATABASE to specify a databaseSQLgtinput ctestscriptsscript01sql
Test resultsThe single bulk UPDATE
First letrsquos perform the test where the single-pass bulk UPDATE is performed
This is an excerpt from the one-pass script with default transaction settings
SQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10509940Delta memory = 214016Max memory = 10509940Elapsed time= 463 secBuffers = 2048Reads = 2111Writes 375Fetches = 974980
SQLgt commitCurrent memory = 10340752Delta memory = -169188Max memory = 10509940Elapsed time= 003 secBuffers = 2048Reads = 1Writes 942Fetches = 3
Fast Report 319
The new version of thefamous Fast Report isnow out FastReportreg 3is an add-in componentthat gives your applica-tions the ability to gener-ate reports quickly andefficiently FastReportregprovides all the tools youneed to develop reportsAll variants of FastRe-portreg 3 contains
Visual report designerwith rulers guides andzooming wizard forbasic types of reportexport filters for html tiffbmp jpg xls pdf out-puts Dot matrix reportssupport support for mostpopular DB-engines FullWYSIWYG text rotation0360 degrees memoobject supports simplehtml-tags (font color b iu sub sup) improvedstretching (StretchModeShiftMode properties)access to DB fieldsstyles text flow URLsAnchors
TrialBuy Now
News amp Events
TECT Softwarepresents
New versions of nice utilitiesfrom TECT Softwarewwwtectsoftnet
SPGen Stored Procedure Gen-erator for Firebird and Inter-
Base creates a standard set ofstored procs for any table full
details can be found here
httpwwwtectsoftnetProductsFirebird
FIBSPGenaspx
And FBMail Firebird EMail(FBMail) is a cross platform PHP
utility which easily allows thesending of emails direct from
within a database
This utility is specifically aimedat ISPs that support Firebirdbut can easily be used on anycomputer which is connected
to the internet
Full detailscan be found here
httpwwwtectsoftnetProductsFirebird
FBMailaspx
News amp Events
This is an excerpt from the one-pass script with NO SAVEPOINT enabled
CopyTiger 100
CopyTigera new productfrom Microtecis an advanced
databasereplication amp
synchronisationtool for
InterBaseFirebird
powered bytheir CopyCatcomponent set(see the articleabout CopyCat
in this issue)
For moreinformation about
CopyTiger seewwwmicrotecfr
copycatct
Downloadan evaluation
version
TestBed
17
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
SQLgt SET TRANSACTION NO SAVEPOINTSQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10352244Delta memory = 55296Max memory = 10352244Elapsed time= 472 secBuffers = 2048Reads = 2102Writes 350Fetches = 967154
SQLgt commitCurrent memory = 10344052Delta memory = -8192Max memory = 10352244Elapsed time= 015 secBuffers = 2048Reads = 1Writes 938Fetches = 3
Performance appears to be almost the same whether the NO SAVEPOINT option is enabled or not
Sequential bulk UPDATE statements
With the mutlti-pass script (sequential UPDATEs) the raw test results are rather large
Table 1
Test results for 5 sequentalUPDATEs
News amp Events
SQLHammer 15Enterprise Edition
The interestingdeveloper toolSQLHammer
now hasan Enterprise Edition
SQLHammer isa high-performance tool
for rapiddatabase development
and administration
It includes a commonintegrated development
environmentregistered custom
componentsbased on the
Borland Package Librarymechanism
and an Open Tools APIwritten in
DelphiObject Pascal
wwwmetadataforgecom
News amp Events
TestBed
18
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
For convenience the results are tabulated above
The first UPDATE statement has almost the same execution time with and without the NOSAVEPOINT option However memory consumption is reduced fivefold when we use NOSAVEPOINT
Figure 1
Time to perform UPDATEs with and without NO SAVEPOINT
Table 1
Test results for 5 sequental UPDATEs
In the second UPDATE we start to see the difference With default transaction settings thisUPDATE takes a very long time - 47 seconds - compared to only 7210 ms with NO SAVE-POINT enabled With default transaction settings we can see that memory usage is signifi-cant wherease with NO SAVEPOINT no additional memory is used
The third and all following UPDATE statements with default settings show equal time andmemory usage values and the growth of writes parameters
Figure 2
Memory usage while performing UPDATEs with and without NO SAVEPOINT
With NO SAVEPOINT usage we observe that timememory values and writes growth areall small and virtually equal for each pass The corresponding graphs are below
Inside the UNDO logSo what happened during the execution of the test scripts What is the secret behind thismagic that NO SAVEPOINT does Is there any cost attached to these gains
A few words about versions
You probably know already that InterBase is a multi-record-version database enginemeaning that each time a record is changed a new version of that record is produced The
TestBed
19
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
old version does not disappearimmediately but is retained as abackversion
In fact the first time a backversion iswritten to disk it is as a delta ver-sion which saves disk space andmemory usage by writing out onlythe differences between the old andthe new versions The engine canrebuild the full old version from thenew version and chains of delta ver-sions It is only if the same record isupdated more than once within thesame transaction that the full back-version is written to disk
The UNDO log concept
You may recall from Dmitrirsquos articlehow each transaction is implicitlyenclosed in a frame of savepointseach having its own undo log Thislog stores a record of each changein sequence ready for the possibili-ty that a rollback will be requested
A backversion materializes when-ever an UPDATE or DELETE state-ment is performed The engine hasto maintain all these backversions inthe undo log for the relevant save-point
So the Undo Log is a mechanism tomanage backversions for save-points in order to enable the associ-ated changes to be rolled back Theprocess of Undo logging is quitecomplex and maintaining it canconsume a lot of resources
The NO SAVEPOINToption
The NO SAVEPOINT option inInterBase 751 is a workaround forthe problem of performance lossduring bulk updates that do multiplepasses of a table The theory is ifusing the implicit savepoint man-agement causes problems then letskill the savepoints No savepoints ndashno problem -)
Besides ISQL it has been surfacedas a transaction parameter in bothDSQL and ESQL At the API level anew transaction parameter block(TPB) option isc_tpb_no_savepointcan be passed in the isc_start_trans-action() function call to disablesavepoints management Syntaxdetails for the latter flavors and forthe new tpb option can be found inthe 751 release notes
The effect of specifying the NOSAVEPOINT transaction parameteris that no undo log will be createdHowever along with the perform-ance gain for sequential bulkupdates it brings some costs fortransaction management
First and most obvious is that withNO SAVEPOINT enabled anyerror handling that relies on save-points is unavailable Any errorduring a NO SAVEPOINT transac-tion precludes all subsequent exe-cution and leads to rollback (see
ldquoRelease Notesrdquo for InterBase 75 SP1 ldquoNew in InterBase 751rdquo page 2-2)
Secondly when a NO SAVEPOINT transaction is rolled back it is marked as rolled back in the transactioninventory page Record version garbage thereby gets stuck in the interesting category and prevents theOIT from advancing Sweep is needed to advance the OIT and back out dead record versions
Fuller details of the NO SAVEPOINT option are provided in the InterBase 751 Release Notes
Initial situation
Consider the implementation details of the undo-log Figure 3 shows the initial situation
Recall that we perform this test on freshly-restored database so it is guaranteed that only one version existsfor any record
Figure 3
Initial situation before any UPDATE - only the one record version exists Undo log is empty
The first UPDATE
The first UPDATE statement creates delta backversions on disk (see figure 4) Since deltas store only thedifferences between the old and new versions they are quite small This operation is fast and it is easywork for the memory manager
It is simple to visualize the undo log when we perform the first UPDATEDELETE statement inside the trans-action ndash the engine just records the numbers of all affected records into the bitmap structure If it needs toroll back the changes associated with this savepoint it can read the stored numbers of the affected recordsthen walk down to the version of each and restore it from the updated version and the backversion storedon disk
TestBed
20
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
Figure 4
UPDATE1 create small delta version on disk and put record number into UNDO log
This approach is very fast and economical on memory usage The enginedoes not waste too many resources to handle this undo log ndash in fact it reusesthe existing multi-versioning mechanism Resource consumption is merelythe memory used to store the bitmap structure with the backversion num-bers We dont see any significant difference here between a transactionwith the default settings and one with the NO SAVEPOINT option enabled
The second UPDATE
When the second UPDATE statement is performed on the same set of records we have adifferent situation
Here is a good place to note that the example we are considering is the most simple situa-tion where only the one global (transaction level) savepoint exists We will also look at thedifference in the Undo log when an explicit (or enclosed BEGINhellip END) savepoint is used
To preserve on-disk integrity (remember the lsquocareful writersquo principle ) the engine mustcompute a new delta between the old version (by transaction1) and new version (by trans-action2 update2) store it somewhere on disk fetch the current full version (by transac-tion2 update1) put it into the in-memory undo-log replace it with the new full version (withbackpointers set to the newly created delta) and erase the old now superseded delta Asyou can see ndash there is much more work to do both on disk and in memory
The engine could write all intermediate versions to disk but there is no reason to do soThese versions are visible only to the modifying transaction and would not be used unlessa rollback was required
Figure 5
The second UPDATE creates a new delta backversion for transaction 1erases from disk the delta version created by the first UPDATE and copiesthe version from UPDATE1 into the Undo log
This all makes hard work for the memory manager and the CPU as you can see from thegrowth of the ldquomax memrdquordquodelta memrdquo parameters values in the test that uses the defaulttransaction parameters
When NO SAVEPOINT is enabled we avoid the expense of maintaining the Undo log Asa result we see execution time readswrites and memory usage as low for subsequentupdates as for the first
The third UPDATE
The third and all subsequent UPDATEs are similar to the second UPDATE with one excep-tion ndash memory usage does not grow any further
TestBed
21
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
Figure 6
The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2version and its own version is written to disk as the latest one
Figure 7
If we have an explicit SAVEPOINT each new record version associated with it will have acorresponding backversion in the Undo log of that savepoint
Original design of IB implement second UPDATE another way but sometime after IB6 Bor-land changed original behavior and we see what we see nowBut this theme is for anotherarticle)
Why is the delta of memory usage zero The reason is that beyond the second UPDATEno record version is created From here on the update just replaces record data on diskwith the newest one and shifts the superseded version into the Undo log
A more interesting question is why we see an increase in disk reads and writes during thetest We would have expected that the third and following UPDATEs would do essentiallyequal numbers of read and writes to write the newest versions and move the previous onesto the undo log However we are actually seeing a growing count of writes We have noanswer for it but we would be pleased to know
The following figure (figure 6) helps to illustrate the situation in the Undo log during thesequential updates When NO SAVEPOINT is enabled the only pieces we need to performare replacing the version on disk and updating the original backversion It is fast as the firstUPDATE
Explicit SAVEPOINTWhen an UPDATE statement is going to be performed within its own explicit or implicit
BEGIN END savepoint framework the engine has to store a backversion for each associ-ated record version in the Undo log
For example if we used an explicit savepoint eg SAVEPOINT Savepoint1 upon perform-ing UPDATE2 we would have the situation illustrated in figure 7
In this case the memory consumption would be expected to increase each time an UPDATEoccurs within the explicit savepoints scope
SummaryThe new transaction option NO SAVEPOINT can solve the problem of excessive resourceusage growth that can occur with sequential bulk updates It should be beneficial whenapplied appropriately Because the option can create its own problems by inhibiting theadvance of the OIT it should be used with caution of course The developer will need to takeextra care about database housekeeping particularly with respect to timely sweeping
Development area
22
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article
The sources in the bibliography arelisted in the order of their appear-ance in my mind
The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability
of all the methods taken from thearticles and of course I have testedall my own methods
Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise
I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc
And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle
The problem state-ment
What is the problem
Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common
1Everyone got used to them andfelt comfortable with them
2They are quite fast (if you usethem according to certain knownstandards)
3They use SQL which is an easycomprehensible and time-proveddata manipulation method
4They are based upon a strongmathematical theory
5They are convenient for applica-
tion development - if you developyour applications just like 20-30years ago
As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and
Author Vladimir Kotlyarevskyvladcontekru
Development area
23
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip
As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored
What do we need
The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing
RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task
The OID
All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it
First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have
any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database
OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex
Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)
Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world
And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones
Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles
The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers
ClassId
All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is
Development area
24
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain
Name Description cre-ation_date change_dateowner
Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect
links is often a very complicatedtask to put it mildly
Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later
The OBJECTS Table
Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure
So let us design this table
Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)
The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)
Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method
Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage
There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student
create domain TOID as integer not nullcreate table OBJECTS(
OID TOID primary keyClassId TOIDName varchar(32)
A simple plain dictionary can be retrieved from the OBJECTStable by the following query
select OID Name Description from OBJECTS where ClassId = LookupId
Development area
25
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description
from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)
Later I will demonstrate how to add alittle more intelligence to such simpledictionaries
Storing of more complexobjects
It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping
Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders
create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))
which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to
ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query
select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id
As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did
If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M
create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))
One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present
Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database
Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this
create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))
connected 1M with OBJECTS by OID There is also a table
create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))
which describes type metadata ndash an attribute set (their names and types) for eachobject type
All attributes for particular object where the OID is known are retrieved by the query
select attribute_id value from attributeswhere OID = oid
or with names of attributes
select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id
Development area
26
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries
select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id
Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query
Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the
database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base
The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1
Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage
Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB
It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-
utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation
To be continuedhellip
FastReport 3 - new generation of the reporting tools
Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix
reports support support most popular DB-engines
Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags
(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)
Access to DB fields styles text flow URLs Anchors
httpwwwfast-reportcom
Development area
27
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
Replicating and synchronizingInterBaseFirebirddatabases using CopyCat
AuthorJonathan Neve
jonathanmicrotecfr
wwwmicrotecfr
PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication
Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases
Data logging
Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)
Multi-node replication
Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after
In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it
replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)
When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it
Two-way replication
One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed
The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change
Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself
Primary key synchronization
One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-
Development area
28
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values
Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server
When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique
Conflict management
Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-
sion of the record this is a conflict
CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved
This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently
Difficult database structures
There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node
Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged
To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an
Development area
29
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable
PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms
1 As a set of Delphi C++ Builder components
2 As a standalone Win32 replication tool
We will now present each one of these products
1 CopyCat Delphi C++ Builder Component Set
CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions
Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server
Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table
Below is a concise guide for getting started with the CopyCat component suite
Download and install the evaluation components fromhttpwwwmicrotecfrcopycat
Prepare databases
1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components
2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)
3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo
4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base
5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo
Development area
30
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data
6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority
7) Apply all the generated SQL to all databases that should be replicated
8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)
Replicate
1 In Delphi open the ldquoReplicatorrdquo example project
2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property
3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases
4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)
5 Compile and run the example
6 Press the ldquoReplicate nowrdquo button
2 CopyTiger the CopyCat Win32standalone replication tool
For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER
Features include
bull Easy to use installer
bull Independent server Administrator tool (CTAdmin)
bull Configuration wizard for setting up links to master slave databases
bull Robust replication engine based on Microtec CopyCat
bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections
bull Simple amp intuitive control panel
bullAutomatic email notification on certain events (conflicts PK violations etc)
Visit the CopyTiger homepage to download a time-limited trial version
httpwwwmicrotecfrcopycatct
SUMMARY
In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit
1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications
2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure
CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool
You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr
As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold
Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code
ibd-852-150
Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems
Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases
Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the
Development area
31
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance
Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets
Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained
database statistics from time to time
Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and
indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports
The summary shown in Figure 1provides general information about
Figure 1 Summary of database statistics
Author Dmitri Kouzmenkokdvib-aidcom
1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction
Development area
32
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases
Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)
As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed
What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay
Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be
stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF
Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure
Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1
Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint
We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base
bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment
bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions
bull The Oldest active3 ndash the oldestcurrently active transaction
bull The Next transaction ndash thetransaction number that will beassigned to new transaction
bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell
if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources
bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset
As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem
It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless
Do not gather statistics if you
bull Just restored your database
bull Performed backup (gbak ndashb dbgdb) without the ndashg switch
bull Recently performed a manual sweep (gfix ndashsweep)
Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)
Development area
33
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw
The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data
For example look at ADDR_ADDRESS_IDX6 First of all the index
How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)
The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most
multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments
Table informationLets take look at another sampleoutput from IBAnalyst
The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics
ly and what the table size is inmegabytes Most of these warningsare customizable
In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records
Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics
were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could
bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway
Development area
34
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20
bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column
That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics
Reports
There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded
As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size
Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-
Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144
ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3
SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance
Readers feedback
35
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Comments to ldquoTemporary tablesrdquo
One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence
Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird
This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience
Then there are situations where theoptimizer simply loses the plot
because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans
Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem
The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp
and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample
The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)
We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is
dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably
Well to be fair Firebird developersare working on both topics
Volker Rehn
volkerrehnbigpondcom
We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it
Alexey Kovyazin Chief Editor
Readers feedback
Miscellaneous
36
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Invitations SubscriptionDonations Sponsorships
This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000
Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance
Subscribe now
To receive future issuesnotifications send email to
subscribeibdevelopercom
MagazineCD
Donations
Cover story
7
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Locking Firebird and the Lock Table
Firebird either denies the newrequest or puts it on a list to waituntil the resource is available Inter-nal lock requests specify whetherthey wait or receive an immediateerror on a case-by-case basisWhen a transaction starts it speci-fies whether it will wait for locks thatit acquires on tables etc
Lock modes
For concurrency and read commit-ted transactions Firebird lockstables for shared read or sharedwrite Either mode says ldquoIrsquom usingthis table but you are free to use ittoordquo Consistency mode transac-tions follow different rules Theylock tables for protected read orprotected write Those modes sayldquoIrsquom using the table and no one elseis allowed to change it until Irsquomdonerdquo Protected read is compati-ble with shared read and other pro-tected read transactions Protectedwrite is only compatible with shareread
The important concept about lockmodes is that locks are more subtlethan mutexes ndash locks allowresource sharing as well as protect-ing resources from incompatibleuse
Two-phase locking vs tran-sient locking
The table locks that wersquove been
describing follow a protocol knownas two-phase locking which is typi-cal of locks taken by transactions indatabase systems Databases thatuse record locking for consistencycontrol always use two-phaserecord locks In two-phase lockinga transaction acquires locks as itproceeds and holds the locks until itends Once it releases any lock itcan no longer acquire another Thetwo phases are lock acquisition andlock release They cannot overlap
When a Firebird transaction readsa table it holds a lock on that tableuntil it ends When a concurrencytransaction has acquired a sharedwrite lock to update a table no con-sistency mode transaction will beable to get a protected lock on thattable until the transaction with theshared write lock ends and releasesits locks Table locking in Firebird istwo-phase locking
Locks can also be transient takenand released as necessary duringthe running of a transaction Fire-bird uses transient locking exten-sively to manage physical access tothe database
Firebird page locks
One major difference between Fire-bird and most other databases isFirebirdrsquos Classic mode In Classicmode many separate processesshare write access to a single data-
base file Most databases have asingle server process like Super-Server that has exclusive access tothe database and coordinatesphysical access to the file within
itself Firebird coordinates physicalaccess to the database throughlocks on database pages
In general database theory a trans-
Cover story
8
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Locking Firebird and the Lock Table
action is a set of steps that transformthe database from on consistentstate to another During that trans-formation the resources held by thetransaction must be protected fromincompatible changes by othertransactions Two-phase locks arethat protection
In Firebird internally each time atransaction changes a page itchanges that page ndash and the physi-cal structure of the database as awhole ndash from one consistent state toanother Before a transaction readsor writes a database page it locksthe page When it finishes readingor writing it can release the lockwithout compromising the physicalconsistency of the database fileFirebird page level locking is tran-sient Transactions acquire andrelease page locks throughout theirexistence However to preventdeadlocks transactions must beable to release all the page locks itholds before acquiring a lock on anew page
The Firebird lock table
When all access to a database isdone in a single process ndash as is thecase with most database systems ndashlocks are held in the serverrsquos memo-ry and the lock table is largely invis-ible The server process extends orremaps the lock information asrequired Firebird however man-ages its locks in a shared memory
section In SuperServer only theserver uses that share memoryarea In Classic every databaseconnection maps the shared memo-ry and every connection can readand change the contents of thememory
The lock table is a separate piece ofshare memory In SuperServer thelock table is mapped into the serverprocess In Classic each processmaps the lock table All databaseson a server computer share thesame lock table except those run-ning with the embedded server
The Firebird lock manager
We often talk about the FirebirdLock Manager as if it were a sepa-rate process but it isnrsquot The lockmanagement code is part of theengine just like the optimizer pars-er and expression evaluator Thereis a formal interface to the lockmanagement code which is similarto the formal interface to the distrib-uted lock manager that was part ofVAXVMS and one of the inter-faces to the Distributed Lock Man-ager from IBM
The lock manager is code in theengine In Classic each process hasits own lock manager When aClassic process requests or releasesa lock its lock management codeacquires a mutex on the sharedmemory section and changes the
state of the lock table to reflect itsrequest
Conflicting lock requests
When a request is made for a lock
on a resource that is already lockedin an incompatible mode one oftwo things happens Either therequesting transaction gets animmediate error or the request is
Registering for the Conference
Call for papers
Sponsoring the Firebird Conference
httpfirebird-conferencecom
Cover story
9
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Locking Firebird and the Lock Table
put on a list of waiting requests andthe transactions that hold conflictinglocks on the resource are notified ofthe conflicting request Part of everylock request is the address of a rou-tine to call when the lock interfereswith another request for a lock onthe same object Depending on theresource the routine may cause thelock to be released or require thenew request to wait
Transient locks like the locks ondatabase pages are releasedimmediately When a transactionrequests a page lock and that pageis already locked in an incompati-ble mode the transaction or trans-actions that hold the lock are noti-fied and must complete what theyare doing and release their locksimmediately Two-phase locks liketable locks are held until the trans-action that owns the lock completesWhen the conflicting lock isreleased and the new lock is grant-ed then transaction that had beenwaiting can proceed
Locks as interprocess com-munication
Lock management requires a highspeed completely reliable commu-nication mechanism between trans-actions including transactions indifferent processes The actualmechanism varies from platform toplatform but for the database to
work the mechanism must be fastand reliable A fast reliable inter-process communication mechanismcan be ndash and is ndash useful for a num-ber of purposes outside the areathatrsquos normally considered data-base locking
For example Firebird uses the locktable to notify running transactionsof the existence of a new index on atable Thatrsquos important since assoon as an index becomes activeevery transaction must help main-tain it ndash making new entries when itstores or modifies data removingentries when it modifies or deletesdata
When a transaction first referencesa table it gets a lock on the exis-tence of indexes for the tableWhen another transaction wants tocreate a new index on that table itmust get an exclusive lock on theexistence of indexes for the tableIts request conflicts with existinglocks and the owners of those locksare notified of the conflict Whenthose transactions are in a statewhere they can accept a new indexthey release their locks and imme-diately request new shared locks onthe existence of indexes for thetable The transaction that wants tocreate the index gets its exclusivelock creates the index and com-mits releasing its exclusive lock onthe existence of indexing As othertransactions get their new locks
they check the index definitions forthe table find the new index defini-tion and begin maintaining theindex
Firebird locking summary
Although Firebird does not lockrecords it uses locks extensively toisolate the effects of concurrenttransactions Locking and the locktable are more visible in Firebirdthan in other databases because thelock table is a central communica-tion channel between the separateprocesses that access the databasein Classic mode In addition to con-trolling access to database objectslike tables and data pages the Fire-bird lock manager allows differenttransactions and processes to notifyeach other of changes to the state ofthe database new indexes etc
Lock table specifics
The Firebird lock table is an in-mem-ory data area that contains of fourprimary types of blocks The lockheader block describes the locktable as a whole and containspointers to lists of other blocks andfree blocks Owner blocks describethe owners of lock requests ndash gen-erally lock owners are transactionsconnections or the SuperServerRequest blocks describe the rela-tionship between an owner and alockable resource ndash whether therequest is granted or pending the
mode of the request etc Lockblocks describe the resources beinglocked
To request a lock the owner findsthe lock block follows the linked listof requests for that lock and addsits request at the end If other own-ers must be notified of a conflictingrequest they are located throughthe request blocks already in the listEach owner block also has a list ofits own requests The performancecritical part of locking is finding lockblocks For that purpose the locktable includes a hash table foraccess to lock blocks based on thename of the resource being locked
A quick refresher on hashing
A hash table is an array with linkedlists of duplicates and collisionshanging from it The names of lock-able objects are transformed by afunction called the hash functioninto the offset of one of the elementsof the array When two namestransform to the same offset theresult is a collision When two lockshave the same name they areduplicates and always collide
In the Firebird lock table the arrayof the hash table contains theaddress of a hash block Hashblocks contain the original name acollision pointer a duplicate point-er and the address of the lock blockthat corresponds to the name Thecollision pointer contains the
Cover story
10
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Locking Firebird and the Lock Table
address of a hash block whosename hashed to the same valueThe duplicate pointer contains theaddress of a hash block that hasexactly the same name
A hash table is fast when there arerelatively few collisions With nocollisions finding a lock blockinvolved hashing the name index-ing into the array and reading thepointer from the first hash blockEach collision adds another pointerto follow and name to check Theratio of the size of the array to thenumber of locks determines thenumber of collisions Unfortunatelythe width of the array cannot beadjusted dynamically because thesize of the array is part of the hashfunction Changing the widthchanges the result of the functionand invalidates all existing entries inthe hash table
Adjusting the lock table toimprove performance
The size of the hash table is set inthe Firebird configuration file Youmust shut down all activity on alldatabases that share the hash tablendash normally all databases on themachine ndash before changes takeeffect The Classic architecture usesthe lock table more heavily thanSuperServer If you choose theClassic architecture you shouldcheck the load on the hash tableperiodically and increase the num-
ber of hash slots if the load is highThe symptom of an overloadedhash table is sluggish performanceunder load
The tool for checking the lock tableis fb_lock_print which is a com-mand line utility in the bin directoryof the Firebird installation tree Thefull lock print describes the entirestate of the lock table and is of limit-ed interest When your system isunder load and behaving badlyinvoke the utility with no options orswitches directing the output to afile Open the file with an editorYoull see output that starts some-thing like this
LOCK_HEADER BLOCK Version114 Active owner0 Length262144 Used85740 Semmask0x0 Flags 0x0001 Enqs 18512 Converts 490 Rejects0 Blocks 0 Deadlock scans0 Deadlocks0 Scan interval10 Acquires 21048 Acquire blocks0 Spin count0 Mutex wait103 Hash slots101 Hash lengths (minavgmax)3 15 30
hellipThe seventh and eighth lines suggestthat the hash table is too small andthat it is affecting system perform-ance In the example these valuesindicate a problem
Mutex wait 103 Hash slots 101 Hash lengths (minavgmax)3 15 30 In the Classic architecture eachprocess makes its own changes tothe lock table Only one process is
allowed to update the lock table atany instant When updating the locktable a process holds the tablersquosmutex A non-zero mutex wait indi-cates that processes are blocked bythe mutex and forced to wait foraccess to the lock table In turn thatindicates a performance probleminside the lock table typicallybecause looking up a lock is slow
If the hash lengths are more thanmin 5 avg 10 or max 30 youneed to increase the number ofhash slots The hash function used inFirebird is quick but not terribly effi-cient It works best if the number ofhash slots is prime
Change this line in the configurationfileLockHashSlots = 101
Uncomment the line by removing
the leading Choose a value thatis a prime number less than 2048
LockHashSlots = 499
The change will not take effect untilall connections to all databases onthe server machine shut down
If you increase the number of hashslots you should also increase thelock table size The second line ofthe lock print
Version114 Active owner 0Length 262144 Used 85740
tells you how close you are to run-ning out of space in the lock tableThe Version and Active owner areuninteresting The length is the max-imum size of the lock table Used isthe amount of space currently allo-cated for the various block typesand hash table If the amount usedis anywhere near the total lengthuncomment this parameter in theconfiguration file by removing theleading and increase the value
LockMemSize = 262144
to this
LockMemSize = 1048576
The value is bytes The default locktable is about a quarter of amegabyte which is insignificant onmodern computers Changing thelock table size will not take effectuntil all connections to all databaseson the server machine shut down
PHP Server
One of the more interest-ing recent developmentsin information technolo-gy has been the rise ofbrowser based applica-tions often referred toby the acronym LAMP
One key hurdle forbroad use of the LAMPtechnology for mid-mar-ket solutions was that itwas never easy to con-figure and manage
PHPServer changes thatit can be installed withjust four clicks of themouse and support forFirebird is compiled in
PHPServer shares all thequalities of Firebird it isa capable compacteasy to install and easyto manage solution
PHPServer is a freedownload
Read more at
wwwfyracleorgphpserverhtml
News amp Events
Server internals
11
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Inside BLOBs
How the serverworks with BLOBs
The BLOB data type is intended forstoring data of variable size Fieldsof BLOB type allow for storage ofdata that cannot be placed in fieldsof other types - for example pic-tures audio files video fragmentsetc
From the point view of the databaseapplication developer using BLOBfields is as transparent as it is forother field types (see chapter ldquoDatatypesrdquo for details) However thereis a significant difference betweenthe internal implementation mecha-nism for BLOBs and that for otherdata
Unlike the mechanism used for han-dling other types of fields the data-base engine uses a special mecha-nism to work with BLOB fields Thismechanism is transparently inte-grated with other record handlingat the application level and at thesame time has its own means ofpage organization Lets consider indetail how the BLOB-handlingmechanism works
Inside BLOBsThis is an excerpt from the book ldquo1000 InterBase amp Firebird Tips amp Tricksrdquoby Alexey Kovyazin and Dmitri Kouzmenko which will be published in 2006
Initially the basic record data onthe data page includes a referenceto a ldquoBLOB recordrdquo for each non-null BLOB field ie to record-likestructure or quasi-record that actu-
ally contains the BLOB dataDepending on the size of the BLOBthis BLOB-record will be one ofthree types
The first type is the simplest If thesize of BLOB-field data is less thanthe free space on the data page it isplaced on the data page as a sepa-rate record of BLOB type
Author Dmitri Kouzmenkokdvib-aidcom
Author Alexey Kovyazinakib-aidcom
Server internals
12
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Inside BLOBs
The second type is used when thesize of BLOB is greater than thefree space on the page In thiscase references to pages contain-ing the actual BLOB data arestored in a quasi-record Thus atwo-level structure of BLOB-fielddata is used
If the size of BLOB-field contents isvery large a three-level structureis used ndash a quasi-record stores ref-erences to BLOB pointer pageswhich contain references to theactual BLOB data
The whole structure of BLOB stor-age (except for the quasi-recordof course) is implemented by onepage type ndash the BLOB page typeDifferent types of BLOB-pages dif-fer from each other in the presenceof a flag (value 0 or 1) defininghow the server should interpret thegiven page
BLOB Page
The blob page consists of the fol-lowing parts
The special header contains thefollowing information
bullThe number of the first blob pagein this blob It is used to check thatpages belong to one blob
bullA sequence number This isimportant in checking the integrityof a BLOB For a BLOB pointerpage it is equal to zero
bullThe length of data on a page Asa page may or may not be filled tothe full extent the length of actualdata is indicated in the header
Maximum BLOB size
As the internal structure for storingBLOB data can have only 3 levelsof organization and the size ofdata page is also limited it is pos-sible to calculate the maximum sizeof a BLOB
However this is a theoretical limit(if you want you can calculate it)but in practice the limit will bemuch lower The reason for thislower limit is that the length ofBLOB-field data is determined by avariable of ULONG type ie itsmaximal size will be equal to 4gigabytes
Moreover in reality this practicallimit is reduced if a UDF is to beused for BLOB processing Aninternal UDF implementationassumes that the maximum BLOB
size will be 2 gigabytes So if youplan to have very large BLOBfields in your database you shouldexperiment with storing data of alarge size beforehand
The segment size mystery
Developers of database applica-tions often ask what the SegmentSize parameter in the definition ofa BLOB is why we need it andwhether or not we should set itwhen creating Blob-fields
In reality there is no need to setthis parameter Actually it is a bitof a relic used by the GPRE utilitywhen pre-processing EmbeddedSQL When working with BLOBsGPRE declares a buffer of speci-fied size based on the segmentsize Setting the segment size hasno influence over the allocationand the size of segments whenstoring the BLOB on disk It alsohas no influence on performanceTherefore the segment size can besafely set to any value but it is setto 80 bytes by default
Information for those who want toknow everything the number 80was chosen because 80 symbolscould be allocated in alphanumer-ic terminals
TestBed
13
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
In issue 1 we published DmitriYemanovrsquos article about the inter-nals of savepoints While that articlewas still on the desk Borlandannounced the release of InterBase751 introducing amongst otherthings a NO SAVEPOINT optionfor transaction management Is thisan important improvement for Inter-Base We decided to give thisimplementation a close look and testit some to discover what it is allabout
Testing NO SAVEPOINT
In order to analyze the problem thatthe new transaction option wasintended to address and to assessits real value we performed severalvery simple SQL tests The tests areall 100 reproducible so you willbe able to verify our results easily
Database for testing
The test database file was created inInterBase 751 page size = 4096character encoding is NONE Itcontains two tables one stored pro-cedure and three generators
For the test we will use only one
Testing the NO SAVEPOINTfeature in InterBase 751 Author Alexey Kovyazin
akib-aidcom
Author Vlad Horsunhvladuserssourceforgenet
table with the following structure
CREATE TABLE TEST (ID NUMERIC(182)NAME VARCHAR(120)DESCRIPTION VARCHAR(250)CNT INTEGERQRT DOUBLE PRECISIONTS_CHANGE TIMESTAMPTS_CREATE TIMESTAMPNOTES BLOB
)This table contains 100000 records which will be updated during the testThe stored procedure and generators are used to fill the table with test dataYou can increase the quantity of records in the test table by calling thestored procedure to insert them
SELECT FROM INSERTRECS(1000)
The second table TEST2DROP has the same structure as the first and is filledwith the same records as TESTINSERT INTO TEST2DROP SELECT FROM TESTAs you will see the second table will be dropped immediately after connectWe are just using it as a way to increase database size cheaply the pagesoccupied by the TEST2DROP table will be released for reuse after we dropthe table With this trick we avoid the impact of database file growth on thetest results
Setting test environment
All that is needed to perform this test is the trial installation package of Inter-Base 751 the test database and an SQL script
Download the InterBase 751 trialversion from wwwborlandcomThe installation process is obviousand well-described in the InterBasedocumentation
You can download a backup of thetest database ready to use fromhttpwwwibdevelopercomissue2testspbackupzip (~4 Mb) oralternatively an SQL script for cre-ating it fromhttpwwwibdevelopercomissue2testspdatabasescriptzip (~1Kb)
If you download the databasebackup the test tables are alreadypopulated with records and youcan proceed straight to the sectionldquoPreparing to testrdquo below
If you choose instead to use theSQL script you will create data-base yourself Make sure youinsert 100000 records into tableTEST using the INSERTRECS storedprocedure and then copy all ofthem to TEST2DROP three or fourtimes
After that perform a backup of thisdatabase and you will be on thesame position as if you had down-
TestBed
14
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
loaded ldquoready for userdquo backup
Hardware is not a material issue for these tests since we are only comparing performance with and without the NOSAVEPOINT option Our test platform was a modest computer with Pentium-4 2 GHz with 512 RAM and an 80GBSamsung HDD
Preparing to test
A separate copy of the test database is used for each test case in order to eliminate any interference between state-ments We create four fresh copies of the database for this purpose Supposing all files are in a directory calledCTEST simply create the four test databases from your test backup file
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp1ib
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp2ib
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp3ib
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp4ib
SQL test scripts
The first script tests a regular one-pass update without the NO SAVEPOINT option
For convenience the important commands are clarified with comments
connect CTESTtestsp1ib USER SYSDBA Password masterkey Connect drop table TEST2DROP Drop table TEST2DROP to free database pagescommitselect count() from test Walk down all records in TEST to place them into cachecommitset time on enable time statistics for performed statementsset stat on enables writesfetchesmemory statisticscommit perform bulk update of all records in TEST table update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommit
quit The second script tests performance for the same UPDATE with the NO SAVEPOINT option
TestBed
15
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
connect Ctestsp2ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitSET TRANSACTION NO SAVEPOINT enable NO SAVEPOINTupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit
Except for the inclusion of the SET TRANSACTION NO SAVEPOINT statement in the second script both scripts arethe same simply testing the behavior of engine in case of the single bulk UPDATE
To test sequential UPDATEs we added several UPDATE statements--we recommend using five The script for testing without NO SAVEPOINT would be
connect Etestsp3ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit
Fyracle 089
Janus has released anew version of Oracle-mode Firebird FyracleFyracle is a specialized
build of Firebird 15it adds temporary
tables hierarchicalqueries and
a PLSQL engine
Version 089 addssupport for storedprocedures written
in Java
Fyracle dramaticallyreduces the cost of port-ing Oracle-based appli-
cations to Firebird
Common usageincludes the
Compiere open sourceERP package
mid-market deploymentsof Developer
2000 applicationsand demo CDsof applications
without license trouble
Read more at
wwwjanus-soft-warecom
News amp Events
IBAnalyst 19
IBSurgeon has issued newversion of IBAnalyst
Now it can better analyzeInterBase or Firebird data-base statistics with usingmetadata information (in thiscase connection to databaseis required)
IBAnalyst is a tool that assistsa user to analyze in detailFirebird or InterBase data-base statistics and identifypossible problems with data-base performance mainte-nance and how an applica-tion interacts with the data-base
It graphically displays data-base statistics and can thenautomatically make intelli-gent suggestions aboutimproving database per-formance and databasemaintenance
wwwibsurgeoncomnewshtml
News amp Events
You can download all the scripts and the raw results of their execution from this locationhttpwwwibdevelopercomissue2testresultszip
TestBed
16
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
How to perform the test
The easiest way to perform the test is to use isqls INPUT command
Suppose you have the scripts located in ctestscripts
gtisqlUse CONNECT or CREATE DATABASE to specify a databaseSQLgtinput ctestscriptsscript01sql
Test resultsThe single bulk UPDATE
First letrsquos perform the test where the single-pass bulk UPDATE is performed
This is an excerpt from the one-pass script with default transaction settings
SQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10509940Delta memory = 214016Max memory = 10509940Elapsed time= 463 secBuffers = 2048Reads = 2111Writes 375Fetches = 974980
SQLgt commitCurrent memory = 10340752Delta memory = -169188Max memory = 10509940Elapsed time= 003 secBuffers = 2048Reads = 1Writes 942Fetches = 3
Fast Report 319
The new version of thefamous Fast Report isnow out FastReportreg 3is an add-in componentthat gives your applica-tions the ability to gener-ate reports quickly andefficiently FastReportregprovides all the tools youneed to develop reportsAll variants of FastRe-portreg 3 contains
Visual report designerwith rulers guides andzooming wizard forbasic types of reportexport filters for html tiffbmp jpg xls pdf out-puts Dot matrix reportssupport support for mostpopular DB-engines FullWYSIWYG text rotation0360 degrees memoobject supports simplehtml-tags (font color b iu sub sup) improvedstretching (StretchModeShiftMode properties)access to DB fieldsstyles text flow URLsAnchors
TrialBuy Now
News amp Events
TECT Softwarepresents
New versions of nice utilitiesfrom TECT Softwarewwwtectsoftnet
SPGen Stored Procedure Gen-erator for Firebird and Inter-
Base creates a standard set ofstored procs for any table full
details can be found here
httpwwwtectsoftnetProductsFirebird
FIBSPGenaspx
And FBMail Firebird EMail(FBMail) is a cross platform PHP
utility which easily allows thesending of emails direct from
within a database
This utility is specifically aimedat ISPs that support Firebirdbut can easily be used on anycomputer which is connected
to the internet
Full detailscan be found here
httpwwwtectsoftnetProductsFirebird
FBMailaspx
News amp Events
This is an excerpt from the one-pass script with NO SAVEPOINT enabled
CopyTiger 100
CopyTigera new productfrom Microtecis an advanced
databasereplication amp
synchronisationtool for
InterBaseFirebird
powered bytheir CopyCatcomponent set(see the articleabout CopyCat
in this issue)
For moreinformation about
CopyTiger seewwwmicrotecfr
copycatct
Downloadan evaluation
version
TestBed
17
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
SQLgt SET TRANSACTION NO SAVEPOINTSQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10352244Delta memory = 55296Max memory = 10352244Elapsed time= 472 secBuffers = 2048Reads = 2102Writes 350Fetches = 967154
SQLgt commitCurrent memory = 10344052Delta memory = -8192Max memory = 10352244Elapsed time= 015 secBuffers = 2048Reads = 1Writes 938Fetches = 3
Performance appears to be almost the same whether the NO SAVEPOINT option is enabled or not
Sequential bulk UPDATE statements
With the mutlti-pass script (sequential UPDATEs) the raw test results are rather large
Table 1
Test results for 5 sequentalUPDATEs
News amp Events
SQLHammer 15Enterprise Edition
The interestingdeveloper toolSQLHammer
now hasan Enterprise Edition
SQLHammer isa high-performance tool
for rapiddatabase development
and administration
It includes a commonintegrated development
environmentregistered custom
componentsbased on the
Borland Package Librarymechanism
and an Open Tools APIwritten in
DelphiObject Pascal
wwwmetadataforgecom
News amp Events
TestBed
18
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
For convenience the results are tabulated above
The first UPDATE statement has almost the same execution time with and without the NOSAVEPOINT option However memory consumption is reduced fivefold when we use NOSAVEPOINT
Figure 1
Time to perform UPDATEs with and without NO SAVEPOINT
Table 1
Test results for 5 sequental UPDATEs
In the second UPDATE we start to see the difference With default transaction settings thisUPDATE takes a very long time - 47 seconds - compared to only 7210 ms with NO SAVE-POINT enabled With default transaction settings we can see that memory usage is signifi-cant wherease with NO SAVEPOINT no additional memory is used
The third and all following UPDATE statements with default settings show equal time andmemory usage values and the growth of writes parameters
Figure 2
Memory usage while performing UPDATEs with and without NO SAVEPOINT
With NO SAVEPOINT usage we observe that timememory values and writes growth areall small and virtually equal for each pass The corresponding graphs are below
Inside the UNDO logSo what happened during the execution of the test scripts What is the secret behind thismagic that NO SAVEPOINT does Is there any cost attached to these gains
A few words about versions
You probably know already that InterBase is a multi-record-version database enginemeaning that each time a record is changed a new version of that record is produced The
TestBed
19
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
old version does not disappearimmediately but is retained as abackversion
In fact the first time a backversion iswritten to disk it is as a delta ver-sion which saves disk space andmemory usage by writing out onlythe differences between the old andthe new versions The engine canrebuild the full old version from thenew version and chains of delta ver-sions It is only if the same record isupdated more than once within thesame transaction that the full back-version is written to disk
The UNDO log concept
You may recall from Dmitrirsquos articlehow each transaction is implicitlyenclosed in a frame of savepointseach having its own undo log Thislog stores a record of each changein sequence ready for the possibili-ty that a rollback will be requested
A backversion materializes when-ever an UPDATE or DELETE state-ment is performed The engine hasto maintain all these backversions inthe undo log for the relevant save-point
So the Undo Log is a mechanism tomanage backversions for save-points in order to enable the associ-ated changes to be rolled back Theprocess of Undo logging is quitecomplex and maintaining it canconsume a lot of resources
The NO SAVEPOINToption
The NO SAVEPOINT option inInterBase 751 is a workaround forthe problem of performance lossduring bulk updates that do multiplepasses of a table The theory is ifusing the implicit savepoint man-agement causes problems then letskill the savepoints No savepoints ndashno problem -)
Besides ISQL it has been surfacedas a transaction parameter in bothDSQL and ESQL At the API level anew transaction parameter block(TPB) option isc_tpb_no_savepointcan be passed in the isc_start_trans-action() function call to disablesavepoints management Syntaxdetails for the latter flavors and forthe new tpb option can be found inthe 751 release notes
The effect of specifying the NOSAVEPOINT transaction parameteris that no undo log will be createdHowever along with the perform-ance gain for sequential bulkupdates it brings some costs fortransaction management
First and most obvious is that withNO SAVEPOINT enabled anyerror handling that relies on save-points is unavailable Any errorduring a NO SAVEPOINT transac-tion precludes all subsequent exe-cution and leads to rollback (see
ldquoRelease Notesrdquo for InterBase 75 SP1 ldquoNew in InterBase 751rdquo page 2-2)
Secondly when a NO SAVEPOINT transaction is rolled back it is marked as rolled back in the transactioninventory page Record version garbage thereby gets stuck in the interesting category and prevents theOIT from advancing Sweep is needed to advance the OIT and back out dead record versions
Fuller details of the NO SAVEPOINT option are provided in the InterBase 751 Release Notes
Initial situation
Consider the implementation details of the undo-log Figure 3 shows the initial situation
Recall that we perform this test on freshly-restored database so it is guaranteed that only one version existsfor any record
Figure 3
Initial situation before any UPDATE - only the one record version exists Undo log is empty
The first UPDATE
The first UPDATE statement creates delta backversions on disk (see figure 4) Since deltas store only thedifferences between the old and new versions they are quite small This operation is fast and it is easywork for the memory manager
It is simple to visualize the undo log when we perform the first UPDATEDELETE statement inside the trans-action ndash the engine just records the numbers of all affected records into the bitmap structure If it needs toroll back the changes associated with this savepoint it can read the stored numbers of the affected recordsthen walk down to the version of each and restore it from the updated version and the backversion storedon disk
TestBed
20
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
Figure 4
UPDATE1 create small delta version on disk and put record number into UNDO log
This approach is very fast and economical on memory usage The enginedoes not waste too many resources to handle this undo log ndash in fact it reusesthe existing multi-versioning mechanism Resource consumption is merelythe memory used to store the bitmap structure with the backversion num-bers We dont see any significant difference here between a transactionwith the default settings and one with the NO SAVEPOINT option enabled
The second UPDATE
When the second UPDATE statement is performed on the same set of records we have adifferent situation
Here is a good place to note that the example we are considering is the most simple situa-tion where only the one global (transaction level) savepoint exists We will also look at thedifference in the Undo log when an explicit (or enclosed BEGINhellip END) savepoint is used
To preserve on-disk integrity (remember the lsquocareful writersquo principle ) the engine mustcompute a new delta between the old version (by transaction1) and new version (by trans-action2 update2) store it somewhere on disk fetch the current full version (by transac-tion2 update1) put it into the in-memory undo-log replace it with the new full version (withbackpointers set to the newly created delta) and erase the old now superseded delta Asyou can see ndash there is much more work to do both on disk and in memory
The engine could write all intermediate versions to disk but there is no reason to do soThese versions are visible only to the modifying transaction and would not be used unlessa rollback was required
Figure 5
The second UPDATE creates a new delta backversion for transaction 1erases from disk the delta version created by the first UPDATE and copiesthe version from UPDATE1 into the Undo log
This all makes hard work for the memory manager and the CPU as you can see from thegrowth of the ldquomax memrdquordquodelta memrdquo parameters values in the test that uses the defaulttransaction parameters
When NO SAVEPOINT is enabled we avoid the expense of maintaining the Undo log Asa result we see execution time readswrites and memory usage as low for subsequentupdates as for the first
The third UPDATE
The third and all subsequent UPDATEs are similar to the second UPDATE with one excep-tion ndash memory usage does not grow any further
TestBed
21
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
Figure 6
The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2version and its own version is written to disk as the latest one
Figure 7
If we have an explicit SAVEPOINT each new record version associated with it will have acorresponding backversion in the Undo log of that savepoint
Original design of IB implement second UPDATE another way but sometime after IB6 Bor-land changed original behavior and we see what we see nowBut this theme is for anotherarticle)
Why is the delta of memory usage zero The reason is that beyond the second UPDATEno record version is created From here on the update just replaces record data on diskwith the newest one and shifts the superseded version into the Undo log
A more interesting question is why we see an increase in disk reads and writes during thetest We would have expected that the third and following UPDATEs would do essentiallyequal numbers of read and writes to write the newest versions and move the previous onesto the undo log However we are actually seeing a growing count of writes We have noanswer for it but we would be pleased to know
The following figure (figure 6) helps to illustrate the situation in the Undo log during thesequential updates When NO SAVEPOINT is enabled the only pieces we need to performare replacing the version on disk and updating the original backversion It is fast as the firstUPDATE
Explicit SAVEPOINTWhen an UPDATE statement is going to be performed within its own explicit or implicit
BEGIN END savepoint framework the engine has to store a backversion for each associ-ated record version in the Undo log
For example if we used an explicit savepoint eg SAVEPOINT Savepoint1 upon perform-ing UPDATE2 we would have the situation illustrated in figure 7
In this case the memory consumption would be expected to increase each time an UPDATEoccurs within the explicit savepoints scope
SummaryThe new transaction option NO SAVEPOINT can solve the problem of excessive resourceusage growth that can occur with sequential bulk updates It should be beneficial whenapplied appropriately Because the option can create its own problems by inhibiting theadvance of the OIT it should be used with caution of course The developer will need to takeextra care about database housekeeping particularly with respect to timely sweeping
Development area
22
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article
The sources in the bibliography arelisted in the order of their appear-ance in my mind
The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability
of all the methods taken from thearticles and of course I have testedall my own methods
Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise
I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc
And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle
The problem state-ment
What is the problem
Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common
1Everyone got used to them andfelt comfortable with them
2They are quite fast (if you usethem according to certain knownstandards)
3They use SQL which is an easycomprehensible and time-proveddata manipulation method
4They are based upon a strongmathematical theory
5They are convenient for applica-
tion development - if you developyour applications just like 20-30years ago
As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and
Author Vladimir Kotlyarevskyvladcontekru
Development area
23
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip
As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored
What do we need
The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing
RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task
The OID
All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it
First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have
any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database
OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex
Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)
Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world
And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones
Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles
The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers
ClassId
All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is
Development area
24
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain
Name Description cre-ation_date change_dateowner
Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect
links is often a very complicatedtask to put it mildly
Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later
The OBJECTS Table
Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure
So let us design this table
Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)
The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)
Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method
Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage
There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student
create domain TOID as integer not nullcreate table OBJECTS(
OID TOID primary keyClassId TOIDName varchar(32)
A simple plain dictionary can be retrieved from the OBJECTStable by the following query
select OID Name Description from OBJECTS where ClassId = LookupId
Development area
25
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description
from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)
Later I will demonstrate how to add alittle more intelligence to such simpledictionaries
Storing of more complexobjects
It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping
Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders
create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))
which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to
ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query
select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id
As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did
If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M
create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))
One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present
Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database
Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this
create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))
connected 1M with OBJECTS by OID There is also a table
create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))
which describes type metadata ndash an attribute set (their names and types) for eachobject type
All attributes for particular object where the OID is known are retrieved by the query
select attribute_id value from attributeswhere OID = oid
or with names of attributes
select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id
Development area
26
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries
select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id
Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query
Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the
database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base
The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1
Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage
Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB
It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-
utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation
To be continuedhellip
FastReport 3 - new generation of the reporting tools
Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix
reports support support most popular DB-engines
Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags
(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)
Access to DB fields styles text flow URLs Anchors
httpwwwfast-reportcom
Development area
27
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
Replicating and synchronizingInterBaseFirebirddatabases using CopyCat
AuthorJonathan Neve
jonathanmicrotecfr
wwwmicrotecfr
PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication
Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases
Data logging
Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)
Multi-node replication
Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after
In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it
replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)
When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it
Two-way replication
One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed
The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change
Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself
Primary key synchronization
One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-
Development area
28
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values
Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server
When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique
Conflict management
Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-
sion of the record this is a conflict
CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved
This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently
Difficult database structures
There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node
Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged
To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an
Development area
29
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable
PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms
1 As a set of Delphi C++ Builder components
2 As a standalone Win32 replication tool
We will now present each one of these products
1 CopyCat Delphi C++ Builder Component Set
CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions
Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server
Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table
Below is a concise guide for getting started with the CopyCat component suite
Download and install the evaluation components fromhttpwwwmicrotecfrcopycat
Prepare databases
1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components
2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)
3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo
4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base
5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo
Development area
30
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data
6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority
7) Apply all the generated SQL to all databases that should be replicated
8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)
Replicate
1 In Delphi open the ldquoReplicatorrdquo example project
2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property
3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases
4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)
5 Compile and run the example
6 Press the ldquoReplicate nowrdquo button
2 CopyTiger the CopyCat Win32standalone replication tool
For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER
Features include
bull Easy to use installer
bull Independent server Administrator tool (CTAdmin)
bull Configuration wizard for setting up links to master slave databases
bull Robust replication engine based on Microtec CopyCat
bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections
bull Simple amp intuitive control panel
bullAutomatic email notification on certain events (conflicts PK violations etc)
Visit the CopyTiger homepage to download a time-limited trial version
httpwwwmicrotecfrcopycatct
SUMMARY
In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit
1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications
2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure
CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool
You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr
As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold
Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code
ibd-852-150
Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems
Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases
Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the
Development area
31
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance
Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets
Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained
database statistics from time to time
Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and
indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports
The summary shown in Figure 1provides general information about
Figure 1 Summary of database statistics
Author Dmitri Kouzmenkokdvib-aidcom
1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction
Development area
32
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases
Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)
As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed
What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay
Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be
stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF
Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure
Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1
Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint
We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base
bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment
bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions
bull The Oldest active3 ndash the oldestcurrently active transaction
bull The Next transaction ndash thetransaction number that will beassigned to new transaction
bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell
if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources
bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset
As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem
It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless
Do not gather statistics if you
bull Just restored your database
bull Performed backup (gbak ndashb dbgdb) without the ndashg switch
bull Recently performed a manual sweep (gfix ndashsweep)
Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)
Development area
33
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw
The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data
For example look at ADDR_ADDRESS_IDX6 First of all the index
How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)
The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most
multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments
Table informationLets take look at another sampleoutput from IBAnalyst
The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics
ly and what the table size is inmegabytes Most of these warningsare customizable
In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records
Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics
were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could
bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway
Development area
34
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20
bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column
That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics
Reports
There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded
As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size
Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-
Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144
ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3
SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance
Readers feedback
35
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Comments to ldquoTemporary tablesrdquo
One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence
Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird
This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience
Then there are situations where theoptimizer simply loses the plot
because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans
Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem
The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp
and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample
The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)
We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is
dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably
Well to be fair Firebird developersare working on both topics
Volker Rehn
volkerrehnbigpondcom
We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it
Alexey Kovyazin Chief Editor
Readers feedback
Miscellaneous
36
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Invitations SubscriptionDonations Sponsorships
This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000
Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance
Subscribe now
To receive future issuesnotifications send email to
subscribeibdevelopercom
MagazineCD
Donations
Cover story
8
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Locking Firebird and the Lock Table
action is a set of steps that transformthe database from on consistentstate to another During that trans-formation the resources held by thetransaction must be protected fromincompatible changes by othertransactions Two-phase locks arethat protection
In Firebird internally each time atransaction changes a page itchanges that page ndash and the physi-cal structure of the database as awhole ndash from one consistent state toanother Before a transaction readsor writes a database page it locksthe page When it finishes readingor writing it can release the lockwithout compromising the physicalconsistency of the database fileFirebird page level locking is tran-sient Transactions acquire andrelease page locks throughout theirexistence However to preventdeadlocks transactions must beable to release all the page locks itholds before acquiring a lock on anew page
The Firebird lock table
When all access to a database isdone in a single process ndash as is thecase with most database systems ndashlocks are held in the serverrsquos memo-ry and the lock table is largely invis-ible The server process extends orremaps the lock information asrequired Firebird however man-ages its locks in a shared memory
section In SuperServer only theserver uses that share memoryarea In Classic every databaseconnection maps the shared memo-ry and every connection can readand change the contents of thememory
The lock table is a separate piece ofshare memory In SuperServer thelock table is mapped into the serverprocess In Classic each processmaps the lock table All databaseson a server computer share thesame lock table except those run-ning with the embedded server
The Firebird lock manager
We often talk about the FirebirdLock Manager as if it were a sepa-rate process but it isnrsquot The lockmanagement code is part of theengine just like the optimizer pars-er and expression evaluator Thereis a formal interface to the lockmanagement code which is similarto the formal interface to the distrib-uted lock manager that was part ofVAXVMS and one of the inter-faces to the Distributed Lock Man-ager from IBM
The lock manager is code in theengine In Classic each process hasits own lock manager When aClassic process requests or releasesa lock its lock management codeacquires a mutex on the sharedmemory section and changes the
state of the lock table to reflect itsrequest
Conflicting lock requests
When a request is made for a lock
on a resource that is already lockedin an incompatible mode one oftwo things happens Either therequesting transaction gets animmediate error or the request is
Registering for the Conference
Call for papers
Sponsoring the Firebird Conference
httpfirebird-conferencecom
Cover story
9
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Locking Firebird and the Lock Table
put on a list of waiting requests andthe transactions that hold conflictinglocks on the resource are notified ofthe conflicting request Part of everylock request is the address of a rou-tine to call when the lock interfereswith another request for a lock onthe same object Depending on theresource the routine may cause thelock to be released or require thenew request to wait
Transient locks like the locks ondatabase pages are releasedimmediately When a transactionrequests a page lock and that pageis already locked in an incompati-ble mode the transaction or trans-actions that hold the lock are noti-fied and must complete what theyare doing and release their locksimmediately Two-phase locks liketable locks are held until the trans-action that owns the lock completesWhen the conflicting lock isreleased and the new lock is grant-ed then transaction that had beenwaiting can proceed
Locks as interprocess com-munication
Lock management requires a highspeed completely reliable commu-nication mechanism between trans-actions including transactions indifferent processes The actualmechanism varies from platform toplatform but for the database to
work the mechanism must be fastand reliable A fast reliable inter-process communication mechanismcan be ndash and is ndash useful for a num-ber of purposes outside the areathatrsquos normally considered data-base locking
For example Firebird uses the locktable to notify running transactionsof the existence of a new index on atable Thatrsquos important since assoon as an index becomes activeevery transaction must help main-tain it ndash making new entries when itstores or modifies data removingentries when it modifies or deletesdata
When a transaction first referencesa table it gets a lock on the exis-tence of indexes for the tableWhen another transaction wants tocreate a new index on that table itmust get an exclusive lock on theexistence of indexes for the tableIts request conflicts with existinglocks and the owners of those locksare notified of the conflict Whenthose transactions are in a statewhere they can accept a new indexthey release their locks and imme-diately request new shared locks onthe existence of indexes for thetable The transaction that wants tocreate the index gets its exclusivelock creates the index and com-mits releasing its exclusive lock onthe existence of indexing As othertransactions get their new locks
they check the index definitions forthe table find the new index defini-tion and begin maintaining theindex
Firebird locking summary
Although Firebird does not lockrecords it uses locks extensively toisolate the effects of concurrenttransactions Locking and the locktable are more visible in Firebirdthan in other databases because thelock table is a central communica-tion channel between the separateprocesses that access the databasein Classic mode In addition to con-trolling access to database objectslike tables and data pages the Fire-bird lock manager allows differenttransactions and processes to notifyeach other of changes to the state ofthe database new indexes etc
Lock table specifics
The Firebird lock table is an in-mem-ory data area that contains of fourprimary types of blocks The lockheader block describes the locktable as a whole and containspointers to lists of other blocks andfree blocks Owner blocks describethe owners of lock requests ndash gen-erally lock owners are transactionsconnections or the SuperServerRequest blocks describe the rela-tionship between an owner and alockable resource ndash whether therequest is granted or pending the
mode of the request etc Lockblocks describe the resources beinglocked
To request a lock the owner findsthe lock block follows the linked listof requests for that lock and addsits request at the end If other own-ers must be notified of a conflictingrequest they are located throughthe request blocks already in the listEach owner block also has a list ofits own requests The performancecritical part of locking is finding lockblocks For that purpose the locktable includes a hash table foraccess to lock blocks based on thename of the resource being locked
A quick refresher on hashing
A hash table is an array with linkedlists of duplicates and collisionshanging from it The names of lock-able objects are transformed by afunction called the hash functioninto the offset of one of the elementsof the array When two namestransform to the same offset theresult is a collision When two lockshave the same name they areduplicates and always collide
In the Firebird lock table the arrayof the hash table contains theaddress of a hash block Hashblocks contain the original name acollision pointer a duplicate point-er and the address of the lock blockthat corresponds to the name Thecollision pointer contains the
Cover story
10
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Locking Firebird and the Lock Table
address of a hash block whosename hashed to the same valueThe duplicate pointer contains theaddress of a hash block that hasexactly the same name
A hash table is fast when there arerelatively few collisions With nocollisions finding a lock blockinvolved hashing the name index-ing into the array and reading thepointer from the first hash blockEach collision adds another pointerto follow and name to check Theratio of the size of the array to thenumber of locks determines thenumber of collisions Unfortunatelythe width of the array cannot beadjusted dynamically because thesize of the array is part of the hashfunction Changing the widthchanges the result of the functionand invalidates all existing entries inthe hash table
Adjusting the lock table toimprove performance
The size of the hash table is set inthe Firebird configuration file Youmust shut down all activity on alldatabases that share the hash tablendash normally all databases on themachine ndash before changes takeeffect The Classic architecture usesthe lock table more heavily thanSuperServer If you choose theClassic architecture you shouldcheck the load on the hash tableperiodically and increase the num-
ber of hash slots if the load is highThe symptom of an overloadedhash table is sluggish performanceunder load
The tool for checking the lock tableis fb_lock_print which is a com-mand line utility in the bin directoryof the Firebird installation tree Thefull lock print describes the entirestate of the lock table and is of limit-ed interest When your system isunder load and behaving badlyinvoke the utility with no options orswitches directing the output to afile Open the file with an editorYoull see output that starts some-thing like this
LOCK_HEADER BLOCK Version114 Active owner0 Length262144 Used85740 Semmask0x0 Flags 0x0001 Enqs 18512 Converts 490 Rejects0 Blocks 0 Deadlock scans0 Deadlocks0 Scan interval10 Acquires 21048 Acquire blocks0 Spin count0 Mutex wait103 Hash slots101 Hash lengths (minavgmax)3 15 30
hellipThe seventh and eighth lines suggestthat the hash table is too small andthat it is affecting system perform-ance In the example these valuesindicate a problem
Mutex wait 103 Hash slots 101 Hash lengths (minavgmax)3 15 30 In the Classic architecture eachprocess makes its own changes tothe lock table Only one process is
allowed to update the lock table atany instant When updating the locktable a process holds the tablersquosmutex A non-zero mutex wait indi-cates that processes are blocked bythe mutex and forced to wait foraccess to the lock table In turn thatindicates a performance probleminside the lock table typicallybecause looking up a lock is slow
If the hash lengths are more thanmin 5 avg 10 or max 30 youneed to increase the number ofhash slots The hash function used inFirebird is quick but not terribly effi-cient It works best if the number ofhash slots is prime
Change this line in the configurationfileLockHashSlots = 101
Uncomment the line by removing
the leading Choose a value thatis a prime number less than 2048
LockHashSlots = 499
The change will not take effect untilall connections to all databases onthe server machine shut down
If you increase the number of hashslots you should also increase thelock table size The second line ofthe lock print
Version114 Active owner 0Length 262144 Used 85740
tells you how close you are to run-ning out of space in the lock tableThe Version and Active owner areuninteresting The length is the max-imum size of the lock table Used isthe amount of space currently allo-cated for the various block typesand hash table If the amount usedis anywhere near the total lengthuncomment this parameter in theconfiguration file by removing theleading and increase the value
LockMemSize = 262144
to this
LockMemSize = 1048576
The value is bytes The default locktable is about a quarter of amegabyte which is insignificant onmodern computers Changing thelock table size will not take effectuntil all connections to all databaseson the server machine shut down
PHP Server
One of the more interest-ing recent developmentsin information technolo-gy has been the rise ofbrowser based applica-tions often referred toby the acronym LAMP
One key hurdle forbroad use of the LAMPtechnology for mid-mar-ket solutions was that itwas never easy to con-figure and manage
PHPServer changes thatit can be installed withjust four clicks of themouse and support forFirebird is compiled in
PHPServer shares all thequalities of Firebird it isa capable compacteasy to install and easyto manage solution
PHPServer is a freedownload
Read more at
wwwfyracleorgphpserverhtml
News amp Events
Server internals
11
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Inside BLOBs
How the serverworks with BLOBs
The BLOB data type is intended forstoring data of variable size Fieldsof BLOB type allow for storage ofdata that cannot be placed in fieldsof other types - for example pic-tures audio files video fragmentsetc
From the point view of the databaseapplication developer using BLOBfields is as transparent as it is forother field types (see chapter ldquoDatatypesrdquo for details) However thereis a significant difference betweenthe internal implementation mecha-nism for BLOBs and that for otherdata
Unlike the mechanism used for han-dling other types of fields the data-base engine uses a special mecha-nism to work with BLOB fields Thismechanism is transparently inte-grated with other record handlingat the application level and at thesame time has its own means ofpage organization Lets consider indetail how the BLOB-handlingmechanism works
Inside BLOBsThis is an excerpt from the book ldquo1000 InterBase amp Firebird Tips amp Tricksrdquoby Alexey Kovyazin and Dmitri Kouzmenko which will be published in 2006
Initially the basic record data onthe data page includes a referenceto a ldquoBLOB recordrdquo for each non-null BLOB field ie to record-likestructure or quasi-record that actu-
ally contains the BLOB dataDepending on the size of the BLOBthis BLOB-record will be one ofthree types
The first type is the simplest If thesize of BLOB-field data is less thanthe free space on the data page it isplaced on the data page as a sepa-rate record of BLOB type
Author Dmitri Kouzmenkokdvib-aidcom
Author Alexey Kovyazinakib-aidcom
Server internals
12
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Inside BLOBs
The second type is used when thesize of BLOB is greater than thefree space on the page In thiscase references to pages contain-ing the actual BLOB data arestored in a quasi-record Thus atwo-level structure of BLOB-fielddata is used
If the size of BLOB-field contents isvery large a three-level structureis used ndash a quasi-record stores ref-erences to BLOB pointer pageswhich contain references to theactual BLOB data
The whole structure of BLOB stor-age (except for the quasi-recordof course) is implemented by onepage type ndash the BLOB page typeDifferent types of BLOB-pages dif-fer from each other in the presenceof a flag (value 0 or 1) defininghow the server should interpret thegiven page
BLOB Page
The blob page consists of the fol-lowing parts
The special header contains thefollowing information
bullThe number of the first blob pagein this blob It is used to check thatpages belong to one blob
bullA sequence number This isimportant in checking the integrityof a BLOB For a BLOB pointerpage it is equal to zero
bullThe length of data on a page Asa page may or may not be filled tothe full extent the length of actualdata is indicated in the header
Maximum BLOB size
As the internal structure for storingBLOB data can have only 3 levelsof organization and the size ofdata page is also limited it is pos-sible to calculate the maximum sizeof a BLOB
However this is a theoretical limit(if you want you can calculate it)but in practice the limit will bemuch lower The reason for thislower limit is that the length ofBLOB-field data is determined by avariable of ULONG type ie itsmaximal size will be equal to 4gigabytes
Moreover in reality this practicallimit is reduced if a UDF is to beused for BLOB processing Aninternal UDF implementationassumes that the maximum BLOB
size will be 2 gigabytes So if youplan to have very large BLOBfields in your database you shouldexperiment with storing data of alarge size beforehand
The segment size mystery
Developers of database applica-tions often ask what the SegmentSize parameter in the definition ofa BLOB is why we need it andwhether or not we should set itwhen creating Blob-fields
In reality there is no need to setthis parameter Actually it is a bitof a relic used by the GPRE utilitywhen pre-processing EmbeddedSQL When working with BLOBsGPRE declares a buffer of speci-fied size based on the segmentsize Setting the segment size hasno influence over the allocationand the size of segments whenstoring the BLOB on disk It alsohas no influence on performanceTherefore the segment size can besafely set to any value but it is setto 80 bytes by default
Information for those who want toknow everything the number 80was chosen because 80 symbolscould be allocated in alphanumer-ic terminals
TestBed
13
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
In issue 1 we published DmitriYemanovrsquos article about the inter-nals of savepoints While that articlewas still on the desk Borlandannounced the release of InterBase751 introducing amongst otherthings a NO SAVEPOINT optionfor transaction management Is thisan important improvement for Inter-Base We decided to give thisimplementation a close look and testit some to discover what it is allabout
Testing NO SAVEPOINT
In order to analyze the problem thatthe new transaction option wasintended to address and to assessits real value we performed severalvery simple SQL tests The tests areall 100 reproducible so you willbe able to verify our results easily
Database for testing
The test database file was created inInterBase 751 page size = 4096character encoding is NONE Itcontains two tables one stored pro-cedure and three generators
For the test we will use only one
Testing the NO SAVEPOINTfeature in InterBase 751 Author Alexey Kovyazin
akib-aidcom
Author Vlad Horsunhvladuserssourceforgenet
table with the following structure
CREATE TABLE TEST (ID NUMERIC(182)NAME VARCHAR(120)DESCRIPTION VARCHAR(250)CNT INTEGERQRT DOUBLE PRECISIONTS_CHANGE TIMESTAMPTS_CREATE TIMESTAMPNOTES BLOB
)This table contains 100000 records which will be updated during the testThe stored procedure and generators are used to fill the table with test dataYou can increase the quantity of records in the test table by calling thestored procedure to insert them
SELECT FROM INSERTRECS(1000)
The second table TEST2DROP has the same structure as the first and is filledwith the same records as TESTINSERT INTO TEST2DROP SELECT FROM TESTAs you will see the second table will be dropped immediately after connectWe are just using it as a way to increase database size cheaply the pagesoccupied by the TEST2DROP table will be released for reuse after we dropthe table With this trick we avoid the impact of database file growth on thetest results
Setting test environment
All that is needed to perform this test is the trial installation package of Inter-Base 751 the test database and an SQL script
Download the InterBase 751 trialversion from wwwborlandcomThe installation process is obviousand well-described in the InterBasedocumentation
You can download a backup of thetest database ready to use fromhttpwwwibdevelopercomissue2testspbackupzip (~4 Mb) oralternatively an SQL script for cre-ating it fromhttpwwwibdevelopercomissue2testspdatabasescriptzip (~1Kb)
If you download the databasebackup the test tables are alreadypopulated with records and youcan proceed straight to the sectionldquoPreparing to testrdquo below
If you choose instead to use theSQL script you will create data-base yourself Make sure youinsert 100000 records into tableTEST using the INSERTRECS storedprocedure and then copy all ofthem to TEST2DROP three or fourtimes
After that perform a backup of thisdatabase and you will be on thesame position as if you had down-
TestBed
14
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
loaded ldquoready for userdquo backup
Hardware is not a material issue for these tests since we are only comparing performance with and without the NOSAVEPOINT option Our test platform was a modest computer with Pentium-4 2 GHz with 512 RAM and an 80GBSamsung HDD
Preparing to test
A separate copy of the test database is used for each test case in order to eliminate any interference between state-ments We create four fresh copies of the database for this purpose Supposing all files are in a directory calledCTEST simply create the four test databases from your test backup file
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp1ib
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp2ib
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp3ib
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp4ib
SQL test scripts
The first script tests a regular one-pass update without the NO SAVEPOINT option
For convenience the important commands are clarified with comments
connect CTESTtestsp1ib USER SYSDBA Password masterkey Connect drop table TEST2DROP Drop table TEST2DROP to free database pagescommitselect count() from test Walk down all records in TEST to place them into cachecommitset time on enable time statistics for performed statementsset stat on enables writesfetchesmemory statisticscommit perform bulk update of all records in TEST table update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommit
quit The second script tests performance for the same UPDATE with the NO SAVEPOINT option
TestBed
15
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
connect Ctestsp2ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitSET TRANSACTION NO SAVEPOINT enable NO SAVEPOINTupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit
Except for the inclusion of the SET TRANSACTION NO SAVEPOINT statement in the second script both scripts arethe same simply testing the behavior of engine in case of the single bulk UPDATE
To test sequential UPDATEs we added several UPDATE statements--we recommend using five The script for testing without NO SAVEPOINT would be
connect Etestsp3ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit
Fyracle 089
Janus has released anew version of Oracle-mode Firebird FyracleFyracle is a specialized
build of Firebird 15it adds temporary
tables hierarchicalqueries and
a PLSQL engine
Version 089 addssupport for storedprocedures written
in Java
Fyracle dramaticallyreduces the cost of port-ing Oracle-based appli-
cations to Firebird
Common usageincludes the
Compiere open sourceERP package
mid-market deploymentsof Developer
2000 applicationsand demo CDsof applications
without license trouble
Read more at
wwwjanus-soft-warecom
News amp Events
IBAnalyst 19
IBSurgeon has issued newversion of IBAnalyst
Now it can better analyzeInterBase or Firebird data-base statistics with usingmetadata information (in thiscase connection to databaseis required)
IBAnalyst is a tool that assistsa user to analyze in detailFirebird or InterBase data-base statistics and identifypossible problems with data-base performance mainte-nance and how an applica-tion interacts with the data-base
It graphically displays data-base statistics and can thenautomatically make intelli-gent suggestions aboutimproving database per-formance and databasemaintenance
wwwibsurgeoncomnewshtml
News amp Events
You can download all the scripts and the raw results of their execution from this locationhttpwwwibdevelopercomissue2testresultszip
TestBed
16
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
How to perform the test
The easiest way to perform the test is to use isqls INPUT command
Suppose you have the scripts located in ctestscripts
gtisqlUse CONNECT or CREATE DATABASE to specify a databaseSQLgtinput ctestscriptsscript01sql
Test resultsThe single bulk UPDATE
First letrsquos perform the test where the single-pass bulk UPDATE is performed
This is an excerpt from the one-pass script with default transaction settings
SQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10509940Delta memory = 214016Max memory = 10509940Elapsed time= 463 secBuffers = 2048Reads = 2111Writes 375Fetches = 974980
SQLgt commitCurrent memory = 10340752Delta memory = -169188Max memory = 10509940Elapsed time= 003 secBuffers = 2048Reads = 1Writes 942Fetches = 3
Fast Report 319
The new version of thefamous Fast Report isnow out FastReportreg 3is an add-in componentthat gives your applica-tions the ability to gener-ate reports quickly andefficiently FastReportregprovides all the tools youneed to develop reportsAll variants of FastRe-portreg 3 contains
Visual report designerwith rulers guides andzooming wizard forbasic types of reportexport filters for html tiffbmp jpg xls pdf out-puts Dot matrix reportssupport support for mostpopular DB-engines FullWYSIWYG text rotation0360 degrees memoobject supports simplehtml-tags (font color b iu sub sup) improvedstretching (StretchModeShiftMode properties)access to DB fieldsstyles text flow URLsAnchors
TrialBuy Now
News amp Events
TECT Softwarepresents
New versions of nice utilitiesfrom TECT Softwarewwwtectsoftnet
SPGen Stored Procedure Gen-erator for Firebird and Inter-
Base creates a standard set ofstored procs for any table full
details can be found here
httpwwwtectsoftnetProductsFirebird
FIBSPGenaspx
And FBMail Firebird EMail(FBMail) is a cross platform PHP
utility which easily allows thesending of emails direct from
within a database
This utility is specifically aimedat ISPs that support Firebirdbut can easily be used on anycomputer which is connected
to the internet
Full detailscan be found here
httpwwwtectsoftnetProductsFirebird
FBMailaspx
News amp Events
This is an excerpt from the one-pass script with NO SAVEPOINT enabled
CopyTiger 100
CopyTigera new productfrom Microtecis an advanced
databasereplication amp
synchronisationtool for
InterBaseFirebird
powered bytheir CopyCatcomponent set(see the articleabout CopyCat
in this issue)
For moreinformation about
CopyTiger seewwwmicrotecfr
copycatct
Downloadan evaluation
version
TestBed
17
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
SQLgt SET TRANSACTION NO SAVEPOINTSQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10352244Delta memory = 55296Max memory = 10352244Elapsed time= 472 secBuffers = 2048Reads = 2102Writes 350Fetches = 967154
SQLgt commitCurrent memory = 10344052Delta memory = -8192Max memory = 10352244Elapsed time= 015 secBuffers = 2048Reads = 1Writes 938Fetches = 3
Performance appears to be almost the same whether the NO SAVEPOINT option is enabled or not
Sequential bulk UPDATE statements
With the mutlti-pass script (sequential UPDATEs) the raw test results are rather large
Table 1
Test results for 5 sequentalUPDATEs
News amp Events
SQLHammer 15Enterprise Edition
The interestingdeveloper toolSQLHammer
now hasan Enterprise Edition
SQLHammer isa high-performance tool
for rapiddatabase development
and administration
It includes a commonintegrated development
environmentregistered custom
componentsbased on the
Borland Package Librarymechanism
and an Open Tools APIwritten in
DelphiObject Pascal
wwwmetadataforgecom
News amp Events
TestBed
18
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
For convenience the results are tabulated above
The first UPDATE statement has almost the same execution time with and without the NOSAVEPOINT option However memory consumption is reduced fivefold when we use NOSAVEPOINT
Figure 1
Time to perform UPDATEs with and without NO SAVEPOINT
Table 1
Test results for 5 sequental UPDATEs
In the second UPDATE we start to see the difference With default transaction settings thisUPDATE takes a very long time - 47 seconds - compared to only 7210 ms with NO SAVE-POINT enabled With default transaction settings we can see that memory usage is signifi-cant wherease with NO SAVEPOINT no additional memory is used
The third and all following UPDATE statements with default settings show equal time andmemory usage values and the growth of writes parameters
Figure 2
Memory usage while performing UPDATEs with and without NO SAVEPOINT
With NO SAVEPOINT usage we observe that timememory values and writes growth areall small and virtually equal for each pass The corresponding graphs are below
Inside the UNDO logSo what happened during the execution of the test scripts What is the secret behind thismagic that NO SAVEPOINT does Is there any cost attached to these gains
A few words about versions
You probably know already that InterBase is a multi-record-version database enginemeaning that each time a record is changed a new version of that record is produced The
TestBed
19
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
old version does not disappearimmediately but is retained as abackversion
In fact the first time a backversion iswritten to disk it is as a delta ver-sion which saves disk space andmemory usage by writing out onlythe differences between the old andthe new versions The engine canrebuild the full old version from thenew version and chains of delta ver-sions It is only if the same record isupdated more than once within thesame transaction that the full back-version is written to disk
The UNDO log concept
You may recall from Dmitrirsquos articlehow each transaction is implicitlyenclosed in a frame of savepointseach having its own undo log Thislog stores a record of each changein sequence ready for the possibili-ty that a rollback will be requested
A backversion materializes when-ever an UPDATE or DELETE state-ment is performed The engine hasto maintain all these backversions inthe undo log for the relevant save-point
So the Undo Log is a mechanism tomanage backversions for save-points in order to enable the associ-ated changes to be rolled back Theprocess of Undo logging is quitecomplex and maintaining it canconsume a lot of resources
The NO SAVEPOINToption
The NO SAVEPOINT option inInterBase 751 is a workaround forthe problem of performance lossduring bulk updates that do multiplepasses of a table The theory is ifusing the implicit savepoint man-agement causes problems then letskill the savepoints No savepoints ndashno problem -)
Besides ISQL it has been surfacedas a transaction parameter in bothDSQL and ESQL At the API level anew transaction parameter block(TPB) option isc_tpb_no_savepointcan be passed in the isc_start_trans-action() function call to disablesavepoints management Syntaxdetails for the latter flavors and forthe new tpb option can be found inthe 751 release notes
The effect of specifying the NOSAVEPOINT transaction parameteris that no undo log will be createdHowever along with the perform-ance gain for sequential bulkupdates it brings some costs fortransaction management
First and most obvious is that withNO SAVEPOINT enabled anyerror handling that relies on save-points is unavailable Any errorduring a NO SAVEPOINT transac-tion precludes all subsequent exe-cution and leads to rollback (see
ldquoRelease Notesrdquo for InterBase 75 SP1 ldquoNew in InterBase 751rdquo page 2-2)
Secondly when a NO SAVEPOINT transaction is rolled back it is marked as rolled back in the transactioninventory page Record version garbage thereby gets stuck in the interesting category and prevents theOIT from advancing Sweep is needed to advance the OIT and back out dead record versions
Fuller details of the NO SAVEPOINT option are provided in the InterBase 751 Release Notes
Initial situation
Consider the implementation details of the undo-log Figure 3 shows the initial situation
Recall that we perform this test on freshly-restored database so it is guaranteed that only one version existsfor any record
Figure 3
Initial situation before any UPDATE - only the one record version exists Undo log is empty
The first UPDATE
The first UPDATE statement creates delta backversions on disk (see figure 4) Since deltas store only thedifferences between the old and new versions they are quite small This operation is fast and it is easywork for the memory manager
It is simple to visualize the undo log when we perform the first UPDATEDELETE statement inside the trans-action ndash the engine just records the numbers of all affected records into the bitmap structure If it needs toroll back the changes associated with this savepoint it can read the stored numbers of the affected recordsthen walk down to the version of each and restore it from the updated version and the backversion storedon disk
TestBed
20
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
Figure 4
UPDATE1 create small delta version on disk and put record number into UNDO log
This approach is very fast and economical on memory usage The enginedoes not waste too many resources to handle this undo log ndash in fact it reusesthe existing multi-versioning mechanism Resource consumption is merelythe memory used to store the bitmap structure with the backversion num-bers We dont see any significant difference here between a transactionwith the default settings and one with the NO SAVEPOINT option enabled
The second UPDATE
When the second UPDATE statement is performed on the same set of records we have adifferent situation
Here is a good place to note that the example we are considering is the most simple situa-tion where only the one global (transaction level) savepoint exists We will also look at thedifference in the Undo log when an explicit (or enclosed BEGINhellip END) savepoint is used
To preserve on-disk integrity (remember the lsquocareful writersquo principle ) the engine mustcompute a new delta between the old version (by transaction1) and new version (by trans-action2 update2) store it somewhere on disk fetch the current full version (by transac-tion2 update1) put it into the in-memory undo-log replace it with the new full version (withbackpointers set to the newly created delta) and erase the old now superseded delta Asyou can see ndash there is much more work to do both on disk and in memory
The engine could write all intermediate versions to disk but there is no reason to do soThese versions are visible only to the modifying transaction and would not be used unlessa rollback was required
Figure 5
The second UPDATE creates a new delta backversion for transaction 1erases from disk the delta version created by the first UPDATE and copiesthe version from UPDATE1 into the Undo log
This all makes hard work for the memory manager and the CPU as you can see from thegrowth of the ldquomax memrdquordquodelta memrdquo parameters values in the test that uses the defaulttransaction parameters
When NO SAVEPOINT is enabled we avoid the expense of maintaining the Undo log Asa result we see execution time readswrites and memory usage as low for subsequentupdates as for the first
The third UPDATE
The third and all subsequent UPDATEs are similar to the second UPDATE with one excep-tion ndash memory usage does not grow any further
TestBed
21
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
Figure 6
The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2version and its own version is written to disk as the latest one
Figure 7
If we have an explicit SAVEPOINT each new record version associated with it will have acorresponding backversion in the Undo log of that savepoint
Original design of IB implement second UPDATE another way but sometime after IB6 Bor-land changed original behavior and we see what we see nowBut this theme is for anotherarticle)
Why is the delta of memory usage zero The reason is that beyond the second UPDATEno record version is created From here on the update just replaces record data on diskwith the newest one and shifts the superseded version into the Undo log
A more interesting question is why we see an increase in disk reads and writes during thetest We would have expected that the third and following UPDATEs would do essentiallyequal numbers of read and writes to write the newest versions and move the previous onesto the undo log However we are actually seeing a growing count of writes We have noanswer for it but we would be pleased to know
The following figure (figure 6) helps to illustrate the situation in the Undo log during thesequential updates When NO SAVEPOINT is enabled the only pieces we need to performare replacing the version on disk and updating the original backversion It is fast as the firstUPDATE
Explicit SAVEPOINTWhen an UPDATE statement is going to be performed within its own explicit or implicit
BEGIN END savepoint framework the engine has to store a backversion for each associ-ated record version in the Undo log
For example if we used an explicit savepoint eg SAVEPOINT Savepoint1 upon perform-ing UPDATE2 we would have the situation illustrated in figure 7
In this case the memory consumption would be expected to increase each time an UPDATEoccurs within the explicit savepoints scope
SummaryThe new transaction option NO SAVEPOINT can solve the problem of excessive resourceusage growth that can occur with sequential bulk updates It should be beneficial whenapplied appropriately Because the option can create its own problems by inhibiting theadvance of the OIT it should be used with caution of course The developer will need to takeextra care about database housekeeping particularly with respect to timely sweeping
Development area
22
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article
The sources in the bibliography arelisted in the order of their appear-ance in my mind
The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability
of all the methods taken from thearticles and of course I have testedall my own methods
Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise
I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc
And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle
The problem state-ment
What is the problem
Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common
1Everyone got used to them andfelt comfortable with them
2They are quite fast (if you usethem according to certain knownstandards)
3They use SQL which is an easycomprehensible and time-proveddata manipulation method
4They are based upon a strongmathematical theory
5They are convenient for applica-
tion development - if you developyour applications just like 20-30years ago
As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and
Author Vladimir Kotlyarevskyvladcontekru
Development area
23
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip
As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored
What do we need
The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing
RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task
The OID
All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it
First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have
any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database
OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex
Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)
Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world
And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones
Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles
The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers
ClassId
All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is
Development area
24
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain
Name Description cre-ation_date change_dateowner
Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect
links is often a very complicatedtask to put it mildly
Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later
The OBJECTS Table
Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure
So let us design this table
Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)
The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)
Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method
Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage
There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student
create domain TOID as integer not nullcreate table OBJECTS(
OID TOID primary keyClassId TOIDName varchar(32)
A simple plain dictionary can be retrieved from the OBJECTStable by the following query
select OID Name Description from OBJECTS where ClassId = LookupId
Development area
25
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description
from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)
Later I will demonstrate how to add alittle more intelligence to such simpledictionaries
Storing of more complexobjects
It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping
Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders
create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))
which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to
ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query
select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id
As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did
If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M
create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))
One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present
Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database
Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this
create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))
connected 1M with OBJECTS by OID There is also a table
create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))
which describes type metadata ndash an attribute set (their names and types) for eachobject type
All attributes for particular object where the OID is known are retrieved by the query
select attribute_id value from attributeswhere OID = oid
or with names of attributes
select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id
Development area
26
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries
select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id
Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query
Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the
database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base
The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1
Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage
Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB
It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-
utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation
To be continuedhellip
FastReport 3 - new generation of the reporting tools
Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix
reports support support most popular DB-engines
Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags
(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)
Access to DB fields styles text flow URLs Anchors
httpwwwfast-reportcom
Development area
27
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
Replicating and synchronizingInterBaseFirebirddatabases using CopyCat
AuthorJonathan Neve
jonathanmicrotecfr
wwwmicrotecfr
PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication
Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases
Data logging
Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)
Multi-node replication
Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after
In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it
replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)
When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it
Two-way replication
One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed
The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change
Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself
Primary key synchronization
One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-
Development area
28
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values
Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server
When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique
Conflict management
Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-
sion of the record this is a conflict
CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved
This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently
Difficult database structures
There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node
Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged
To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an
Development area
29
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable
PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms
1 As a set of Delphi C++ Builder components
2 As a standalone Win32 replication tool
We will now present each one of these products
1 CopyCat Delphi C++ Builder Component Set
CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions
Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server
Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table
Below is a concise guide for getting started with the CopyCat component suite
Download and install the evaluation components fromhttpwwwmicrotecfrcopycat
Prepare databases
1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components
2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)
3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo
4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base
5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo
Development area
30
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data
6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority
7) Apply all the generated SQL to all databases that should be replicated
8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)
Replicate
1 In Delphi open the ldquoReplicatorrdquo example project
2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property
3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases
4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)
5 Compile and run the example
6 Press the ldquoReplicate nowrdquo button
2 CopyTiger the CopyCat Win32standalone replication tool
For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER
Features include
bull Easy to use installer
bull Independent server Administrator tool (CTAdmin)
bull Configuration wizard for setting up links to master slave databases
bull Robust replication engine based on Microtec CopyCat
bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections
bull Simple amp intuitive control panel
bullAutomatic email notification on certain events (conflicts PK violations etc)
Visit the CopyTiger homepage to download a time-limited trial version
httpwwwmicrotecfrcopycatct
SUMMARY
In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit
1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications
2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure
CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool
You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr
As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold
Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code
ibd-852-150
Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems
Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases
Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the
Development area
31
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance
Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets
Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained
database statistics from time to time
Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and
indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports
The summary shown in Figure 1provides general information about
Figure 1 Summary of database statistics
Author Dmitri Kouzmenkokdvib-aidcom
1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction
Development area
32
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases
Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)
As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed
What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay
Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be
stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF
Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure
Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1
Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint
We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base
bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment
bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions
bull The Oldest active3 ndash the oldestcurrently active transaction
bull The Next transaction ndash thetransaction number that will beassigned to new transaction
bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell
if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources
bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset
As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem
It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless
Do not gather statistics if you
bull Just restored your database
bull Performed backup (gbak ndashb dbgdb) without the ndashg switch
bull Recently performed a manual sweep (gfix ndashsweep)
Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)
Development area
33
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw
The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data
For example look at ADDR_ADDRESS_IDX6 First of all the index
How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)
The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most
multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments
Table informationLets take look at another sampleoutput from IBAnalyst
The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics
ly and what the table size is inmegabytes Most of these warningsare customizable
In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records
Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics
were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could
bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway
Development area
34
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20
bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column
That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics
Reports
There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded
As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size
Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-
Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144
ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3
SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance
Readers feedback
35
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Comments to ldquoTemporary tablesrdquo
One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence
Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird
This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience
Then there are situations where theoptimizer simply loses the plot
because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans
Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem
The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp
and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample
The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)
We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is
dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably
Well to be fair Firebird developersare working on both topics
Volker Rehn
volkerrehnbigpondcom
We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it
Alexey Kovyazin Chief Editor
Readers feedback
Miscellaneous
36
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Invitations SubscriptionDonations Sponsorships
This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000
Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance
Subscribe now
To receive future issuesnotifications send email to
subscribeibdevelopercom
MagazineCD
Donations
Cover story
9
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Locking Firebird and the Lock Table
put on a list of waiting requests andthe transactions that hold conflictinglocks on the resource are notified ofthe conflicting request Part of everylock request is the address of a rou-tine to call when the lock interfereswith another request for a lock onthe same object Depending on theresource the routine may cause thelock to be released or require thenew request to wait
Transient locks like the locks ondatabase pages are releasedimmediately When a transactionrequests a page lock and that pageis already locked in an incompati-ble mode the transaction or trans-actions that hold the lock are noti-fied and must complete what theyare doing and release their locksimmediately Two-phase locks liketable locks are held until the trans-action that owns the lock completesWhen the conflicting lock isreleased and the new lock is grant-ed then transaction that had beenwaiting can proceed
Locks as interprocess com-munication
Lock management requires a highspeed completely reliable commu-nication mechanism between trans-actions including transactions indifferent processes The actualmechanism varies from platform toplatform but for the database to
work the mechanism must be fastand reliable A fast reliable inter-process communication mechanismcan be ndash and is ndash useful for a num-ber of purposes outside the areathatrsquos normally considered data-base locking
For example Firebird uses the locktable to notify running transactionsof the existence of a new index on atable Thatrsquos important since assoon as an index becomes activeevery transaction must help main-tain it ndash making new entries when itstores or modifies data removingentries when it modifies or deletesdata
When a transaction first referencesa table it gets a lock on the exis-tence of indexes for the tableWhen another transaction wants tocreate a new index on that table itmust get an exclusive lock on theexistence of indexes for the tableIts request conflicts with existinglocks and the owners of those locksare notified of the conflict Whenthose transactions are in a statewhere they can accept a new indexthey release their locks and imme-diately request new shared locks onthe existence of indexes for thetable The transaction that wants tocreate the index gets its exclusivelock creates the index and com-mits releasing its exclusive lock onthe existence of indexing As othertransactions get their new locks
they check the index definitions forthe table find the new index defini-tion and begin maintaining theindex
Firebird locking summary
Although Firebird does not lockrecords it uses locks extensively toisolate the effects of concurrenttransactions Locking and the locktable are more visible in Firebirdthan in other databases because thelock table is a central communica-tion channel between the separateprocesses that access the databasein Classic mode In addition to con-trolling access to database objectslike tables and data pages the Fire-bird lock manager allows differenttransactions and processes to notifyeach other of changes to the state ofthe database new indexes etc
Lock table specifics
The Firebird lock table is an in-mem-ory data area that contains of fourprimary types of blocks The lockheader block describes the locktable as a whole and containspointers to lists of other blocks andfree blocks Owner blocks describethe owners of lock requests ndash gen-erally lock owners are transactionsconnections or the SuperServerRequest blocks describe the rela-tionship between an owner and alockable resource ndash whether therequest is granted or pending the
mode of the request etc Lockblocks describe the resources beinglocked
To request a lock the owner findsthe lock block follows the linked listof requests for that lock and addsits request at the end If other own-ers must be notified of a conflictingrequest they are located throughthe request blocks already in the listEach owner block also has a list ofits own requests The performancecritical part of locking is finding lockblocks For that purpose the locktable includes a hash table foraccess to lock blocks based on thename of the resource being locked
A quick refresher on hashing
A hash table is an array with linkedlists of duplicates and collisionshanging from it The names of lock-able objects are transformed by afunction called the hash functioninto the offset of one of the elementsof the array When two namestransform to the same offset theresult is a collision When two lockshave the same name they areduplicates and always collide
In the Firebird lock table the arrayof the hash table contains theaddress of a hash block Hashblocks contain the original name acollision pointer a duplicate point-er and the address of the lock blockthat corresponds to the name Thecollision pointer contains the
Cover story
10
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Locking Firebird and the Lock Table
address of a hash block whosename hashed to the same valueThe duplicate pointer contains theaddress of a hash block that hasexactly the same name
A hash table is fast when there arerelatively few collisions With nocollisions finding a lock blockinvolved hashing the name index-ing into the array and reading thepointer from the first hash blockEach collision adds another pointerto follow and name to check Theratio of the size of the array to thenumber of locks determines thenumber of collisions Unfortunatelythe width of the array cannot beadjusted dynamically because thesize of the array is part of the hashfunction Changing the widthchanges the result of the functionand invalidates all existing entries inthe hash table
Adjusting the lock table toimprove performance
The size of the hash table is set inthe Firebird configuration file Youmust shut down all activity on alldatabases that share the hash tablendash normally all databases on themachine ndash before changes takeeffect The Classic architecture usesthe lock table more heavily thanSuperServer If you choose theClassic architecture you shouldcheck the load on the hash tableperiodically and increase the num-
ber of hash slots if the load is highThe symptom of an overloadedhash table is sluggish performanceunder load
The tool for checking the lock tableis fb_lock_print which is a com-mand line utility in the bin directoryof the Firebird installation tree Thefull lock print describes the entirestate of the lock table and is of limit-ed interest When your system isunder load and behaving badlyinvoke the utility with no options orswitches directing the output to afile Open the file with an editorYoull see output that starts some-thing like this
LOCK_HEADER BLOCK Version114 Active owner0 Length262144 Used85740 Semmask0x0 Flags 0x0001 Enqs 18512 Converts 490 Rejects0 Blocks 0 Deadlock scans0 Deadlocks0 Scan interval10 Acquires 21048 Acquire blocks0 Spin count0 Mutex wait103 Hash slots101 Hash lengths (minavgmax)3 15 30
hellipThe seventh and eighth lines suggestthat the hash table is too small andthat it is affecting system perform-ance In the example these valuesindicate a problem
Mutex wait 103 Hash slots 101 Hash lengths (minavgmax)3 15 30 In the Classic architecture eachprocess makes its own changes tothe lock table Only one process is
allowed to update the lock table atany instant When updating the locktable a process holds the tablersquosmutex A non-zero mutex wait indi-cates that processes are blocked bythe mutex and forced to wait foraccess to the lock table In turn thatindicates a performance probleminside the lock table typicallybecause looking up a lock is slow
If the hash lengths are more thanmin 5 avg 10 or max 30 youneed to increase the number ofhash slots The hash function used inFirebird is quick but not terribly effi-cient It works best if the number ofhash slots is prime
Change this line in the configurationfileLockHashSlots = 101
Uncomment the line by removing
the leading Choose a value thatis a prime number less than 2048
LockHashSlots = 499
The change will not take effect untilall connections to all databases onthe server machine shut down
If you increase the number of hashslots you should also increase thelock table size The second line ofthe lock print
Version114 Active owner 0Length 262144 Used 85740
tells you how close you are to run-ning out of space in the lock tableThe Version and Active owner areuninteresting The length is the max-imum size of the lock table Used isthe amount of space currently allo-cated for the various block typesand hash table If the amount usedis anywhere near the total lengthuncomment this parameter in theconfiguration file by removing theleading and increase the value
LockMemSize = 262144
to this
LockMemSize = 1048576
The value is bytes The default locktable is about a quarter of amegabyte which is insignificant onmodern computers Changing thelock table size will not take effectuntil all connections to all databaseson the server machine shut down
PHP Server
One of the more interest-ing recent developmentsin information technolo-gy has been the rise ofbrowser based applica-tions often referred toby the acronym LAMP
One key hurdle forbroad use of the LAMPtechnology for mid-mar-ket solutions was that itwas never easy to con-figure and manage
PHPServer changes thatit can be installed withjust four clicks of themouse and support forFirebird is compiled in
PHPServer shares all thequalities of Firebird it isa capable compacteasy to install and easyto manage solution
PHPServer is a freedownload
Read more at
wwwfyracleorgphpserverhtml
News amp Events
Server internals
11
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Inside BLOBs
How the serverworks with BLOBs
The BLOB data type is intended forstoring data of variable size Fieldsof BLOB type allow for storage ofdata that cannot be placed in fieldsof other types - for example pic-tures audio files video fragmentsetc
From the point view of the databaseapplication developer using BLOBfields is as transparent as it is forother field types (see chapter ldquoDatatypesrdquo for details) However thereis a significant difference betweenthe internal implementation mecha-nism for BLOBs and that for otherdata
Unlike the mechanism used for han-dling other types of fields the data-base engine uses a special mecha-nism to work with BLOB fields Thismechanism is transparently inte-grated with other record handlingat the application level and at thesame time has its own means ofpage organization Lets consider indetail how the BLOB-handlingmechanism works
Inside BLOBsThis is an excerpt from the book ldquo1000 InterBase amp Firebird Tips amp Tricksrdquoby Alexey Kovyazin and Dmitri Kouzmenko which will be published in 2006
Initially the basic record data onthe data page includes a referenceto a ldquoBLOB recordrdquo for each non-null BLOB field ie to record-likestructure or quasi-record that actu-
ally contains the BLOB dataDepending on the size of the BLOBthis BLOB-record will be one ofthree types
The first type is the simplest If thesize of BLOB-field data is less thanthe free space on the data page it isplaced on the data page as a sepa-rate record of BLOB type
Author Dmitri Kouzmenkokdvib-aidcom
Author Alexey Kovyazinakib-aidcom
Server internals
12
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Inside BLOBs
The second type is used when thesize of BLOB is greater than thefree space on the page In thiscase references to pages contain-ing the actual BLOB data arestored in a quasi-record Thus atwo-level structure of BLOB-fielddata is used
If the size of BLOB-field contents isvery large a three-level structureis used ndash a quasi-record stores ref-erences to BLOB pointer pageswhich contain references to theactual BLOB data
The whole structure of BLOB stor-age (except for the quasi-recordof course) is implemented by onepage type ndash the BLOB page typeDifferent types of BLOB-pages dif-fer from each other in the presenceof a flag (value 0 or 1) defininghow the server should interpret thegiven page
BLOB Page
The blob page consists of the fol-lowing parts
The special header contains thefollowing information
bullThe number of the first blob pagein this blob It is used to check thatpages belong to one blob
bullA sequence number This isimportant in checking the integrityof a BLOB For a BLOB pointerpage it is equal to zero
bullThe length of data on a page Asa page may or may not be filled tothe full extent the length of actualdata is indicated in the header
Maximum BLOB size
As the internal structure for storingBLOB data can have only 3 levelsof organization and the size ofdata page is also limited it is pos-sible to calculate the maximum sizeof a BLOB
However this is a theoretical limit(if you want you can calculate it)but in practice the limit will bemuch lower The reason for thislower limit is that the length ofBLOB-field data is determined by avariable of ULONG type ie itsmaximal size will be equal to 4gigabytes
Moreover in reality this practicallimit is reduced if a UDF is to beused for BLOB processing Aninternal UDF implementationassumes that the maximum BLOB
size will be 2 gigabytes So if youplan to have very large BLOBfields in your database you shouldexperiment with storing data of alarge size beforehand
The segment size mystery
Developers of database applica-tions often ask what the SegmentSize parameter in the definition ofa BLOB is why we need it andwhether or not we should set itwhen creating Blob-fields
In reality there is no need to setthis parameter Actually it is a bitof a relic used by the GPRE utilitywhen pre-processing EmbeddedSQL When working with BLOBsGPRE declares a buffer of speci-fied size based on the segmentsize Setting the segment size hasno influence over the allocationand the size of segments whenstoring the BLOB on disk It alsohas no influence on performanceTherefore the segment size can besafely set to any value but it is setto 80 bytes by default
Information for those who want toknow everything the number 80was chosen because 80 symbolscould be allocated in alphanumer-ic terminals
TestBed
13
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
In issue 1 we published DmitriYemanovrsquos article about the inter-nals of savepoints While that articlewas still on the desk Borlandannounced the release of InterBase751 introducing amongst otherthings a NO SAVEPOINT optionfor transaction management Is thisan important improvement for Inter-Base We decided to give thisimplementation a close look and testit some to discover what it is allabout
Testing NO SAVEPOINT
In order to analyze the problem thatthe new transaction option wasintended to address and to assessits real value we performed severalvery simple SQL tests The tests areall 100 reproducible so you willbe able to verify our results easily
Database for testing
The test database file was created inInterBase 751 page size = 4096character encoding is NONE Itcontains two tables one stored pro-cedure and three generators
For the test we will use only one
Testing the NO SAVEPOINTfeature in InterBase 751 Author Alexey Kovyazin
akib-aidcom
Author Vlad Horsunhvladuserssourceforgenet
table with the following structure
CREATE TABLE TEST (ID NUMERIC(182)NAME VARCHAR(120)DESCRIPTION VARCHAR(250)CNT INTEGERQRT DOUBLE PRECISIONTS_CHANGE TIMESTAMPTS_CREATE TIMESTAMPNOTES BLOB
)This table contains 100000 records which will be updated during the testThe stored procedure and generators are used to fill the table with test dataYou can increase the quantity of records in the test table by calling thestored procedure to insert them
SELECT FROM INSERTRECS(1000)
The second table TEST2DROP has the same structure as the first and is filledwith the same records as TESTINSERT INTO TEST2DROP SELECT FROM TESTAs you will see the second table will be dropped immediately after connectWe are just using it as a way to increase database size cheaply the pagesoccupied by the TEST2DROP table will be released for reuse after we dropthe table With this trick we avoid the impact of database file growth on thetest results
Setting test environment
All that is needed to perform this test is the trial installation package of Inter-Base 751 the test database and an SQL script
Download the InterBase 751 trialversion from wwwborlandcomThe installation process is obviousand well-described in the InterBasedocumentation
You can download a backup of thetest database ready to use fromhttpwwwibdevelopercomissue2testspbackupzip (~4 Mb) oralternatively an SQL script for cre-ating it fromhttpwwwibdevelopercomissue2testspdatabasescriptzip (~1Kb)
If you download the databasebackup the test tables are alreadypopulated with records and youcan proceed straight to the sectionldquoPreparing to testrdquo below
If you choose instead to use theSQL script you will create data-base yourself Make sure youinsert 100000 records into tableTEST using the INSERTRECS storedprocedure and then copy all ofthem to TEST2DROP three or fourtimes
After that perform a backup of thisdatabase and you will be on thesame position as if you had down-
TestBed
14
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
loaded ldquoready for userdquo backup
Hardware is not a material issue for these tests since we are only comparing performance with and without the NOSAVEPOINT option Our test platform was a modest computer with Pentium-4 2 GHz with 512 RAM and an 80GBSamsung HDD
Preparing to test
A separate copy of the test database is used for each test case in order to eliminate any interference between state-ments We create four fresh copies of the database for this purpose Supposing all files are in a directory calledCTEST simply create the four test databases from your test backup file
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp1ib
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp2ib
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp3ib
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp4ib
SQL test scripts
The first script tests a regular one-pass update without the NO SAVEPOINT option
For convenience the important commands are clarified with comments
connect CTESTtestsp1ib USER SYSDBA Password masterkey Connect drop table TEST2DROP Drop table TEST2DROP to free database pagescommitselect count() from test Walk down all records in TEST to place them into cachecommitset time on enable time statistics for performed statementsset stat on enables writesfetchesmemory statisticscommit perform bulk update of all records in TEST table update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommit
quit The second script tests performance for the same UPDATE with the NO SAVEPOINT option
TestBed
15
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
connect Ctestsp2ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitSET TRANSACTION NO SAVEPOINT enable NO SAVEPOINTupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit
Except for the inclusion of the SET TRANSACTION NO SAVEPOINT statement in the second script both scripts arethe same simply testing the behavior of engine in case of the single bulk UPDATE
To test sequential UPDATEs we added several UPDATE statements--we recommend using five The script for testing without NO SAVEPOINT would be
connect Etestsp3ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit
Fyracle 089
Janus has released anew version of Oracle-mode Firebird FyracleFyracle is a specialized
build of Firebird 15it adds temporary
tables hierarchicalqueries and
a PLSQL engine
Version 089 addssupport for storedprocedures written
in Java
Fyracle dramaticallyreduces the cost of port-ing Oracle-based appli-
cations to Firebird
Common usageincludes the
Compiere open sourceERP package
mid-market deploymentsof Developer
2000 applicationsand demo CDsof applications
without license trouble
Read more at
wwwjanus-soft-warecom
News amp Events
IBAnalyst 19
IBSurgeon has issued newversion of IBAnalyst
Now it can better analyzeInterBase or Firebird data-base statistics with usingmetadata information (in thiscase connection to databaseis required)
IBAnalyst is a tool that assistsa user to analyze in detailFirebird or InterBase data-base statistics and identifypossible problems with data-base performance mainte-nance and how an applica-tion interacts with the data-base
It graphically displays data-base statistics and can thenautomatically make intelli-gent suggestions aboutimproving database per-formance and databasemaintenance
wwwibsurgeoncomnewshtml
News amp Events
You can download all the scripts and the raw results of their execution from this locationhttpwwwibdevelopercomissue2testresultszip
TestBed
16
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
How to perform the test
The easiest way to perform the test is to use isqls INPUT command
Suppose you have the scripts located in ctestscripts
gtisqlUse CONNECT or CREATE DATABASE to specify a databaseSQLgtinput ctestscriptsscript01sql
Test resultsThe single bulk UPDATE
First letrsquos perform the test where the single-pass bulk UPDATE is performed
This is an excerpt from the one-pass script with default transaction settings
SQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10509940Delta memory = 214016Max memory = 10509940Elapsed time= 463 secBuffers = 2048Reads = 2111Writes 375Fetches = 974980
SQLgt commitCurrent memory = 10340752Delta memory = -169188Max memory = 10509940Elapsed time= 003 secBuffers = 2048Reads = 1Writes 942Fetches = 3
Fast Report 319
The new version of thefamous Fast Report isnow out FastReportreg 3is an add-in componentthat gives your applica-tions the ability to gener-ate reports quickly andefficiently FastReportregprovides all the tools youneed to develop reportsAll variants of FastRe-portreg 3 contains
Visual report designerwith rulers guides andzooming wizard forbasic types of reportexport filters for html tiffbmp jpg xls pdf out-puts Dot matrix reportssupport support for mostpopular DB-engines FullWYSIWYG text rotation0360 degrees memoobject supports simplehtml-tags (font color b iu sub sup) improvedstretching (StretchModeShiftMode properties)access to DB fieldsstyles text flow URLsAnchors
TrialBuy Now
News amp Events
TECT Softwarepresents
New versions of nice utilitiesfrom TECT Softwarewwwtectsoftnet
SPGen Stored Procedure Gen-erator for Firebird and Inter-
Base creates a standard set ofstored procs for any table full
details can be found here
httpwwwtectsoftnetProductsFirebird
FIBSPGenaspx
And FBMail Firebird EMail(FBMail) is a cross platform PHP
utility which easily allows thesending of emails direct from
within a database
This utility is specifically aimedat ISPs that support Firebirdbut can easily be used on anycomputer which is connected
to the internet
Full detailscan be found here
httpwwwtectsoftnetProductsFirebird
FBMailaspx
News amp Events
This is an excerpt from the one-pass script with NO SAVEPOINT enabled
CopyTiger 100
CopyTigera new productfrom Microtecis an advanced
databasereplication amp
synchronisationtool for
InterBaseFirebird
powered bytheir CopyCatcomponent set(see the articleabout CopyCat
in this issue)
For moreinformation about
CopyTiger seewwwmicrotecfr
copycatct
Downloadan evaluation
version
TestBed
17
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
SQLgt SET TRANSACTION NO SAVEPOINTSQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10352244Delta memory = 55296Max memory = 10352244Elapsed time= 472 secBuffers = 2048Reads = 2102Writes 350Fetches = 967154
SQLgt commitCurrent memory = 10344052Delta memory = -8192Max memory = 10352244Elapsed time= 015 secBuffers = 2048Reads = 1Writes 938Fetches = 3
Performance appears to be almost the same whether the NO SAVEPOINT option is enabled or not
Sequential bulk UPDATE statements
With the mutlti-pass script (sequential UPDATEs) the raw test results are rather large
Table 1
Test results for 5 sequentalUPDATEs
News amp Events
SQLHammer 15Enterprise Edition
The interestingdeveloper toolSQLHammer
now hasan Enterprise Edition
SQLHammer isa high-performance tool
for rapiddatabase development
and administration
It includes a commonintegrated development
environmentregistered custom
componentsbased on the
Borland Package Librarymechanism
and an Open Tools APIwritten in
DelphiObject Pascal
wwwmetadataforgecom
News amp Events
TestBed
18
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
For convenience the results are tabulated above
The first UPDATE statement has almost the same execution time with and without the NOSAVEPOINT option However memory consumption is reduced fivefold when we use NOSAVEPOINT
Figure 1
Time to perform UPDATEs with and without NO SAVEPOINT
Table 1
Test results for 5 sequental UPDATEs
In the second UPDATE we start to see the difference With default transaction settings thisUPDATE takes a very long time - 47 seconds - compared to only 7210 ms with NO SAVE-POINT enabled With default transaction settings we can see that memory usage is signifi-cant wherease with NO SAVEPOINT no additional memory is used
The third and all following UPDATE statements with default settings show equal time andmemory usage values and the growth of writes parameters
Figure 2
Memory usage while performing UPDATEs with and without NO SAVEPOINT
With NO SAVEPOINT usage we observe that timememory values and writes growth areall small and virtually equal for each pass The corresponding graphs are below
Inside the UNDO logSo what happened during the execution of the test scripts What is the secret behind thismagic that NO SAVEPOINT does Is there any cost attached to these gains
A few words about versions
You probably know already that InterBase is a multi-record-version database enginemeaning that each time a record is changed a new version of that record is produced The
TestBed
19
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
old version does not disappearimmediately but is retained as abackversion
In fact the first time a backversion iswritten to disk it is as a delta ver-sion which saves disk space andmemory usage by writing out onlythe differences between the old andthe new versions The engine canrebuild the full old version from thenew version and chains of delta ver-sions It is only if the same record isupdated more than once within thesame transaction that the full back-version is written to disk
The UNDO log concept
You may recall from Dmitrirsquos articlehow each transaction is implicitlyenclosed in a frame of savepointseach having its own undo log Thislog stores a record of each changein sequence ready for the possibili-ty that a rollback will be requested
A backversion materializes when-ever an UPDATE or DELETE state-ment is performed The engine hasto maintain all these backversions inthe undo log for the relevant save-point
So the Undo Log is a mechanism tomanage backversions for save-points in order to enable the associ-ated changes to be rolled back Theprocess of Undo logging is quitecomplex and maintaining it canconsume a lot of resources
The NO SAVEPOINToption
The NO SAVEPOINT option inInterBase 751 is a workaround forthe problem of performance lossduring bulk updates that do multiplepasses of a table The theory is ifusing the implicit savepoint man-agement causes problems then letskill the savepoints No savepoints ndashno problem -)
Besides ISQL it has been surfacedas a transaction parameter in bothDSQL and ESQL At the API level anew transaction parameter block(TPB) option isc_tpb_no_savepointcan be passed in the isc_start_trans-action() function call to disablesavepoints management Syntaxdetails for the latter flavors and forthe new tpb option can be found inthe 751 release notes
The effect of specifying the NOSAVEPOINT transaction parameteris that no undo log will be createdHowever along with the perform-ance gain for sequential bulkupdates it brings some costs fortransaction management
First and most obvious is that withNO SAVEPOINT enabled anyerror handling that relies on save-points is unavailable Any errorduring a NO SAVEPOINT transac-tion precludes all subsequent exe-cution and leads to rollback (see
ldquoRelease Notesrdquo for InterBase 75 SP1 ldquoNew in InterBase 751rdquo page 2-2)
Secondly when a NO SAVEPOINT transaction is rolled back it is marked as rolled back in the transactioninventory page Record version garbage thereby gets stuck in the interesting category and prevents theOIT from advancing Sweep is needed to advance the OIT and back out dead record versions
Fuller details of the NO SAVEPOINT option are provided in the InterBase 751 Release Notes
Initial situation
Consider the implementation details of the undo-log Figure 3 shows the initial situation
Recall that we perform this test on freshly-restored database so it is guaranteed that only one version existsfor any record
Figure 3
Initial situation before any UPDATE - only the one record version exists Undo log is empty
The first UPDATE
The first UPDATE statement creates delta backversions on disk (see figure 4) Since deltas store only thedifferences between the old and new versions they are quite small This operation is fast and it is easywork for the memory manager
It is simple to visualize the undo log when we perform the first UPDATEDELETE statement inside the trans-action ndash the engine just records the numbers of all affected records into the bitmap structure If it needs toroll back the changes associated with this savepoint it can read the stored numbers of the affected recordsthen walk down to the version of each and restore it from the updated version and the backversion storedon disk
TestBed
20
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
Figure 4
UPDATE1 create small delta version on disk and put record number into UNDO log
This approach is very fast and economical on memory usage The enginedoes not waste too many resources to handle this undo log ndash in fact it reusesthe existing multi-versioning mechanism Resource consumption is merelythe memory used to store the bitmap structure with the backversion num-bers We dont see any significant difference here between a transactionwith the default settings and one with the NO SAVEPOINT option enabled
The second UPDATE
When the second UPDATE statement is performed on the same set of records we have adifferent situation
Here is a good place to note that the example we are considering is the most simple situa-tion where only the one global (transaction level) savepoint exists We will also look at thedifference in the Undo log when an explicit (or enclosed BEGINhellip END) savepoint is used
To preserve on-disk integrity (remember the lsquocareful writersquo principle ) the engine mustcompute a new delta between the old version (by transaction1) and new version (by trans-action2 update2) store it somewhere on disk fetch the current full version (by transac-tion2 update1) put it into the in-memory undo-log replace it with the new full version (withbackpointers set to the newly created delta) and erase the old now superseded delta Asyou can see ndash there is much more work to do both on disk and in memory
The engine could write all intermediate versions to disk but there is no reason to do soThese versions are visible only to the modifying transaction and would not be used unlessa rollback was required
Figure 5
The second UPDATE creates a new delta backversion for transaction 1erases from disk the delta version created by the first UPDATE and copiesthe version from UPDATE1 into the Undo log
This all makes hard work for the memory manager and the CPU as you can see from thegrowth of the ldquomax memrdquordquodelta memrdquo parameters values in the test that uses the defaulttransaction parameters
When NO SAVEPOINT is enabled we avoid the expense of maintaining the Undo log Asa result we see execution time readswrites and memory usage as low for subsequentupdates as for the first
The third UPDATE
The third and all subsequent UPDATEs are similar to the second UPDATE with one excep-tion ndash memory usage does not grow any further
TestBed
21
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
Figure 6
The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2version and its own version is written to disk as the latest one
Figure 7
If we have an explicit SAVEPOINT each new record version associated with it will have acorresponding backversion in the Undo log of that savepoint
Original design of IB implement second UPDATE another way but sometime after IB6 Bor-land changed original behavior and we see what we see nowBut this theme is for anotherarticle)
Why is the delta of memory usage zero The reason is that beyond the second UPDATEno record version is created From here on the update just replaces record data on diskwith the newest one and shifts the superseded version into the Undo log
A more interesting question is why we see an increase in disk reads and writes during thetest We would have expected that the third and following UPDATEs would do essentiallyequal numbers of read and writes to write the newest versions and move the previous onesto the undo log However we are actually seeing a growing count of writes We have noanswer for it but we would be pleased to know
The following figure (figure 6) helps to illustrate the situation in the Undo log during thesequential updates When NO SAVEPOINT is enabled the only pieces we need to performare replacing the version on disk and updating the original backversion It is fast as the firstUPDATE
Explicit SAVEPOINTWhen an UPDATE statement is going to be performed within its own explicit or implicit
BEGIN END savepoint framework the engine has to store a backversion for each associ-ated record version in the Undo log
For example if we used an explicit savepoint eg SAVEPOINT Savepoint1 upon perform-ing UPDATE2 we would have the situation illustrated in figure 7
In this case the memory consumption would be expected to increase each time an UPDATEoccurs within the explicit savepoints scope
SummaryThe new transaction option NO SAVEPOINT can solve the problem of excessive resourceusage growth that can occur with sequential bulk updates It should be beneficial whenapplied appropriately Because the option can create its own problems by inhibiting theadvance of the OIT it should be used with caution of course The developer will need to takeextra care about database housekeeping particularly with respect to timely sweeping
Development area
22
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article
The sources in the bibliography arelisted in the order of their appear-ance in my mind
The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability
of all the methods taken from thearticles and of course I have testedall my own methods
Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise
I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc
And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle
The problem state-ment
What is the problem
Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common
1Everyone got used to them andfelt comfortable with them
2They are quite fast (if you usethem according to certain knownstandards)
3They use SQL which is an easycomprehensible and time-proveddata manipulation method
4They are based upon a strongmathematical theory
5They are convenient for applica-
tion development - if you developyour applications just like 20-30years ago
As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and
Author Vladimir Kotlyarevskyvladcontekru
Development area
23
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip
As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored
What do we need
The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing
RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task
The OID
All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it
First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have
any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database
OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex
Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)
Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world
And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones
Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles
The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers
ClassId
All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is
Development area
24
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain
Name Description cre-ation_date change_dateowner
Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect
links is often a very complicatedtask to put it mildly
Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later
The OBJECTS Table
Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure
So let us design this table
Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)
The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)
Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method
Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage
There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student
create domain TOID as integer not nullcreate table OBJECTS(
OID TOID primary keyClassId TOIDName varchar(32)
A simple plain dictionary can be retrieved from the OBJECTStable by the following query
select OID Name Description from OBJECTS where ClassId = LookupId
Development area
25
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description
from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)
Later I will demonstrate how to add alittle more intelligence to such simpledictionaries
Storing of more complexobjects
It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping
Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders
create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))
which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to
ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query
select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id
As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did
If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M
create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))
One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present
Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database
Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this
create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))
connected 1M with OBJECTS by OID There is also a table
create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))
which describes type metadata ndash an attribute set (their names and types) for eachobject type
All attributes for particular object where the OID is known are retrieved by the query
select attribute_id value from attributeswhere OID = oid
or with names of attributes
select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id
Development area
26
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries
select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id
Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query
Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the
database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base
The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1
Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage
Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB
It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-
utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation
To be continuedhellip
FastReport 3 - new generation of the reporting tools
Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix
reports support support most popular DB-engines
Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags
(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)
Access to DB fields styles text flow URLs Anchors
httpwwwfast-reportcom
Development area
27
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
Replicating and synchronizingInterBaseFirebirddatabases using CopyCat
AuthorJonathan Neve
jonathanmicrotecfr
wwwmicrotecfr
PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication
Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases
Data logging
Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)
Multi-node replication
Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after
In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it
replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)
When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it
Two-way replication
One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed
The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change
Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself
Primary key synchronization
One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-
Development area
28
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values
Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server
When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique
Conflict management
Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-
sion of the record this is a conflict
CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved
This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently
Difficult database structures
There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node
Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged
To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an
Development area
29
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable
PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms
1 As a set of Delphi C++ Builder components
2 As a standalone Win32 replication tool
We will now present each one of these products
1 CopyCat Delphi C++ Builder Component Set
CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions
Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server
Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table
Below is a concise guide for getting started with the CopyCat component suite
Download and install the evaluation components fromhttpwwwmicrotecfrcopycat
Prepare databases
1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components
2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)
3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo
4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base
5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo
Development area
30
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data
6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority
7) Apply all the generated SQL to all databases that should be replicated
8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)
Replicate
1 In Delphi open the ldquoReplicatorrdquo example project
2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property
3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases
4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)
5 Compile and run the example
6 Press the ldquoReplicate nowrdquo button
2 CopyTiger the CopyCat Win32standalone replication tool
For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER
Features include
bull Easy to use installer
bull Independent server Administrator tool (CTAdmin)
bull Configuration wizard for setting up links to master slave databases
bull Robust replication engine based on Microtec CopyCat
bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections
bull Simple amp intuitive control panel
bullAutomatic email notification on certain events (conflicts PK violations etc)
Visit the CopyTiger homepage to download a time-limited trial version
httpwwwmicrotecfrcopycatct
SUMMARY
In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit
1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications
2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure
CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool
You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr
As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold
Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code
ibd-852-150
Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems
Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases
Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the
Development area
31
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance
Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets
Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained
database statistics from time to time
Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and
indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports
The summary shown in Figure 1provides general information about
Figure 1 Summary of database statistics
Author Dmitri Kouzmenkokdvib-aidcom
1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction
Development area
32
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases
Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)
As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed
What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay
Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be
stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF
Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure
Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1
Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint
We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base
bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment
bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions
bull The Oldest active3 ndash the oldestcurrently active transaction
bull The Next transaction ndash thetransaction number that will beassigned to new transaction
bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell
if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources
bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset
As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem
It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless
Do not gather statistics if you
bull Just restored your database
bull Performed backup (gbak ndashb dbgdb) without the ndashg switch
bull Recently performed a manual sweep (gfix ndashsweep)
Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)
Development area
33
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw
The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data
For example look at ADDR_ADDRESS_IDX6 First of all the index
How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)
The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most
multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments
Table informationLets take look at another sampleoutput from IBAnalyst
The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics
ly and what the table size is inmegabytes Most of these warningsare customizable
In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records
Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics
were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could
bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway
Development area
34
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20
bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column
That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics
Reports
There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded
As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size
Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-
Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144
ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3
SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance
Readers feedback
35
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Comments to ldquoTemporary tablesrdquo
One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence
Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird
This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience
Then there are situations where theoptimizer simply loses the plot
because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans
Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem
The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp
and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample
The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)
We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is
dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably
Well to be fair Firebird developersare working on both topics
Volker Rehn
volkerrehnbigpondcom
We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it
Alexey Kovyazin Chief Editor
Readers feedback
Miscellaneous
36
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Invitations SubscriptionDonations Sponsorships
This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000
Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance
Subscribe now
To receive future issuesnotifications send email to
subscribeibdevelopercom
MagazineCD
Donations
Cover story
10
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Locking Firebird and the Lock Table
address of a hash block whosename hashed to the same valueThe duplicate pointer contains theaddress of a hash block that hasexactly the same name
A hash table is fast when there arerelatively few collisions With nocollisions finding a lock blockinvolved hashing the name index-ing into the array and reading thepointer from the first hash blockEach collision adds another pointerto follow and name to check Theratio of the size of the array to thenumber of locks determines thenumber of collisions Unfortunatelythe width of the array cannot beadjusted dynamically because thesize of the array is part of the hashfunction Changing the widthchanges the result of the functionand invalidates all existing entries inthe hash table
Adjusting the lock table toimprove performance
The size of the hash table is set inthe Firebird configuration file Youmust shut down all activity on alldatabases that share the hash tablendash normally all databases on themachine ndash before changes takeeffect The Classic architecture usesthe lock table more heavily thanSuperServer If you choose theClassic architecture you shouldcheck the load on the hash tableperiodically and increase the num-
ber of hash slots if the load is highThe symptom of an overloadedhash table is sluggish performanceunder load
The tool for checking the lock tableis fb_lock_print which is a com-mand line utility in the bin directoryof the Firebird installation tree Thefull lock print describes the entirestate of the lock table and is of limit-ed interest When your system isunder load and behaving badlyinvoke the utility with no options orswitches directing the output to afile Open the file with an editorYoull see output that starts some-thing like this
LOCK_HEADER BLOCK Version114 Active owner0 Length262144 Used85740 Semmask0x0 Flags 0x0001 Enqs 18512 Converts 490 Rejects0 Blocks 0 Deadlock scans0 Deadlocks0 Scan interval10 Acquires 21048 Acquire blocks0 Spin count0 Mutex wait103 Hash slots101 Hash lengths (minavgmax)3 15 30
hellipThe seventh and eighth lines suggestthat the hash table is too small andthat it is affecting system perform-ance In the example these valuesindicate a problem
Mutex wait 103 Hash slots 101 Hash lengths (minavgmax)3 15 30 In the Classic architecture eachprocess makes its own changes tothe lock table Only one process is
allowed to update the lock table atany instant When updating the locktable a process holds the tablersquosmutex A non-zero mutex wait indi-cates that processes are blocked bythe mutex and forced to wait foraccess to the lock table In turn thatindicates a performance probleminside the lock table typicallybecause looking up a lock is slow
If the hash lengths are more thanmin 5 avg 10 or max 30 youneed to increase the number ofhash slots The hash function used inFirebird is quick but not terribly effi-cient It works best if the number ofhash slots is prime
Change this line in the configurationfileLockHashSlots = 101
Uncomment the line by removing
the leading Choose a value thatis a prime number less than 2048
LockHashSlots = 499
The change will not take effect untilall connections to all databases onthe server machine shut down
If you increase the number of hashslots you should also increase thelock table size The second line ofthe lock print
Version114 Active owner 0Length 262144 Used 85740
tells you how close you are to run-ning out of space in the lock tableThe Version and Active owner areuninteresting The length is the max-imum size of the lock table Used isthe amount of space currently allo-cated for the various block typesand hash table If the amount usedis anywhere near the total lengthuncomment this parameter in theconfiguration file by removing theleading and increase the value
LockMemSize = 262144
to this
LockMemSize = 1048576
The value is bytes The default locktable is about a quarter of amegabyte which is insignificant onmodern computers Changing thelock table size will not take effectuntil all connections to all databaseson the server machine shut down
PHP Server
One of the more interest-ing recent developmentsin information technolo-gy has been the rise ofbrowser based applica-tions often referred toby the acronym LAMP
One key hurdle forbroad use of the LAMPtechnology for mid-mar-ket solutions was that itwas never easy to con-figure and manage
PHPServer changes thatit can be installed withjust four clicks of themouse and support forFirebird is compiled in
PHPServer shares all thequalities of Firebird it isa capable compacteasy to install and easyto manage solution
PHPServer is a freedownload
Read more at
wwwfyracleorgphpserverhtml
News amp Events
Server internals
11
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Inside BLOBs
How the serverworks with BLOBs
The BLOB data type is intended forstoring data of variable size Fieldsof BLOB type allow for storage ofdata that cannot be placed in fieldsof other types - for example pic-tures audio files video fragmentsetc
From the point view of the databaseapplication developer using BLOBfields is as transparent as it is forother field types (see chapter ldquoDatatypesrdquo for details) However thereis a significant difference betweenthe internal implementation mecha-nism for BLOBs and that for otherdata
Unlike the mechanism used for han-dling other types of fields the data-base engine uses a special mecha-nism to work with BLOB fields Thismechanism is transparently inte-grated with other record handlingat the application level and at thesame time has its own means ofpage organization Lets consider indetail how the BLOB-handlingmechanism works
Inside BLOBsThis is an excerpt from the book ldquo1000 InterBase amp Firebird Tips amp Tricksrdquoby Alexey Kovyazin and Dmitri Kouzmenko which will be published in 2006
Initially the basic record data onthe data page includes a referenceto a ldquoBLOB recordrdquo for each non-null BLOB field ie to record-likestructure or quasi-record that actu-
ally contains the BLOB dataDepending on the size of the BLOBthis BLOB-record will be one ofthree types
The first type is the simplest If thesize of BLOB-field data is less thanthe free space on the data page it isplaced on the data page as a sepa-rate record of BLOB type
Author Dmitri Kouzmenkokdvib-aidcom
Author Alexey Kovyazinakib-aidcom
Server internals
12
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Inside BLOBs
The second type is used when thesize of BLOB is greater than thefree space on the page In thiscase references to pages contain-ing the actual BLOB data arestored in a quasi-record Thus atwo-level structure of BLOB-fielddata is used
If the size of BLOB-field contents isvery large a three-level structureis used ndash a quasi-record stores ref-erences to BLOB pointer pageswhich contain references to theactual BLOB data
The whole structure of BLOB stor-age (except for the quasi-recordof course) is implemented by onepage type ndash the BLOB page typeDifferent types of BLOB-pages dif-fer from each other in the presenceof a flag (value 0 or 1) defininghow the server should interpret thegiven page
BLOB Page
The blob page consists of the fol-lowing parts
The special header contains thefollowing information
bullThe number of the first blob pagein this blob It is used to check thatpages belong to one blob
bullA sequence number This isimportant in checking the integrityof a BLOB For a BLOB pointerpage it is equal to zero
bullThe length of data on a page Asa page may or may not be filled tothe full extent the length of actualdata is indicated in the header
Maximum BLOB size
As the internal structure for storingBLOB data can have only 3 levelsof organization and the size ofdata page is also limited it is pos-sible to calculate the maximum sizeof a BLOB
However this is a theoretical limit(if you want you can calculate it)but in practice the limit will bemuch lower The reason for thislower limit is that the length ofBLOB-field data is determined by avariable of ULONG type ie itsmaximal size will be equal to 4gigabytes
Moreover in reality this practicallimit is reduced if a UDF is to beused for BLOB processing Aninternal UDF implementationassumes that the maximum BLOB
size will be 2 gigabytes So if youplan to have very large BLOBfields in your database you shouldexperiment with storing data of alarge size beforehand
The segment size mystery
Developers of database applica-tions often ask what the SegmentSize parameter in the definition ofa BLOB is why we need it andwhether or not we should set itwhen creating Blob-fields
In reality there is no need to setthis parameter Actually it is a bitof a relic used by the GPRE utilitywhen pre-processing EmbeddedSQL When working with BLOBsGPRE declares a buffer of speci-fied size based on the segmentsize Setting the segment size hasno influence over the allocationand the size of segments whenstoring the BLOB on disk It alsohas no influence on performanceTherefore the segment size can besafely set to any value but it is setto 80 bytes by default
Information for those who want toknow everything the number 80was chosen because 80 symbolscould be allocated in alphanumer-ic terminals
TestBed
13
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
In issue 1 we published DmitriYemanovrsquos article about the inter-nals of savepoints While that articlewas still on the desk Borlandannounced the release of InterBase751 introducing amongst otherthings a NO SAVEPOINT optionfor transaction management Is thisan important improvement for Inter-Base We decided to give thisimplementation a close look and testit some to discover what it is allabout
Testing NO SAVEPOINT
In order to analyze the problem thatthe new transaction option wasintended to address and to assessits real value we performed severalvery simple SQL tests The tests areall 100 reproducible so you willbe able to verify our results easily
Database for testing
The test database file was created inInterBase 751 page size = 4096character encoding is NONE Itcontains two tables one stored pro-cedure and three generators
For the test we will use only one
Testing the NO SAVEPOINTfeature in InterBase 751 Author Alexey Kovyazin
akib-aidcom
Author Vlad Horsunhvladuserssourceforgenet
table with the following structure
CREATE TABLE TEST (ID NUMERIC(182)NAME VARCHAR(120)DESCRIPTION VARCHAR(250)CNT INTEGERQRT DOUBLE PRECISIONTS_CHANGE TIMESTAMPTS_CREATE TIMESTAMPNOTES BLOB
)This table contains 100000 records which will be updated during the testThe stored procedure and generators are used to fill the table with test dataYou can increase the quantity of records in the test table by calling thestored procedure to insert them
SELECT FROM INSERTRECS(1000)
The second table TEST2DROP has the same structure as the first and is filledwith the same records as TESTINSERT INTO TEST2DROP SELECT FROM TESTAs you will see the second table will be dropped immediately after connectWe are just using it as a way to increase database size cheaply the pagesoccupied by the TEST2DROP table will be released for reuse after we dropthe table With this trick we avoid the impact of database file growth on thetest results
Setting test environment
All that is needed to perform this test is the trial installation package of Inter-Base 751 the test database and an SQL script
Download the InterBase 751 trialversion from wwwborlandcomThe installation process is obviousand well-described in the InterBasedocumentation
You can download a backup of thetest database ready to use fromhttpwwwibdevelopercomissue2testspbackupzip (~4 Mb) oralternatively an SQL script for cre-ating it fromhttpwwwibdevelopercomissue2testspdatabasescriptzip (~1Kb)
If you download the databasebackup the test tables are alreadypopulated with records and youcan proceed straight to the sectionldquoPreparing to testrdquo below
If you choose instead to use theSQL script you will create data-base yourself Make sure youinsert 100000 records into tableTEST using the INSERTRECS storedprocedure and then copy all ofthem to TEST2DROP three or fourtimes
After that perform a backup of thisdatabase and you will be on thesame position as if you had down-
TestBed
14
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
loaded ldquoready for userdquo backup
Hardware is not a material issue for these tests since we are only comparing performance with and without the NOSAVEPOINT option Our test platform was a modest computer with Pentium-4 2 GHz with 512 RAM and an 80GBSamsung HDD
Preparing to test
A separate copy of the test database is used for each test case in order to eliminate any interference between state-ments We create four fresh copies of the database for this purpose Supposing all files are in a directory calledCTEST simply create the four test databases from your test backup file
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp1ib
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp2ib
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp3ib
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp4ib
SQL test scripts
The first script tests a regular one-pass update without the NO SAVEPOINT option
For convenience the important commands are clarified with comments
connect CTESTtestsp1ib USER SYSDBA Password masterkey Connect drop table TEST2DROP Drop table TEST2DROP to free database pagescommitselect count() from test Walk down all records in TEST to place them into cachecommitset time on enable time statistics for performed statementsset stat on enables writesfetchesmemory statisticscommit perform bulk update of all records in TEST table update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommit
quit The second script tests performance for the same UPDATE with the NO SAVEPOINT option
TestBed
15
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
connect Ctestsp2ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitSET TRANSACTION NO SAVEPOINT enable NO SAVEPOINTupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit
Except for the inclusion of the SET TRANSACTION NO SAVEPOINT statement in the second script both scripts arethe same simply testing the behavior of engine in case of the single bulk UPDATE
To test sequential UPDATEs we added several UPDATE statements--we recommend using five The script for testing without NO SAVEPOINT would be
connect Etestsp3ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit
Fyracle 089
Janus has released anew version of Oracle-mode Firebird FyracleFyracle is a specialized
build of Firebird 15it adds temporary
tables hierarchicalqueries and
a PLSQL engine
Version 089 addssupport for storedprocedures written
in Java
Fyracle dramaticallyreduces the cost of port-ing Oracle-based appli-
cations to Firebird
Common usageincludes the
Compiere open sourceERP package
mid-market deploymentsof Developer
2000 applicationsand demo CDsof applications
without license trouble
Read more at
wwwjanus-soft-warecom
News amp Events
IBAnalyst 19
IBSurgeon has issued newversion of IBAnalyst
Now it can better analyzeInterBase or Firebird data-base statistics with usingmetadata information (in thiscase connection to databaseis required)
IBAnalyst is a tool that assistsa user to analyze in detailFirebird or InterBase data-base statistics and identifypossible problems with data-base performance mainte-nance and how an applica-tion interacts with the data-base
It graphically displays data-base statistics and can thenautomatically make intelli-gent suggestions aboutimproving database per-formance and databasemaintenance
wwwibsurgeoncomnewshtml
News amp Events
You can download all the scripts and the raw results of their execution from this locationhttpwwwibdevelopercomissue2testresultszip
TestBed
16
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
How to perform the test
The easiest way to perform the test is to use isqls INPUT command
Suppose you have the scripts located in ctestscripts
gtisqlUse CONNECT or CREATE DATABASE to specify a databaseSQLgtinput ctestscriptsscript01sql
Test resultsThe single bulk UPDATE
First letrsquos perform the test where the single-pass bulk UPDATE is performed
This is an excerpt from the one-pass script with default transaction settings
SQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10509940Delta memory = 214016Max memory = 10509940Elapsed time= 463 secBuffers = 2048Reads = 2111Writes 375Fetches = 974980
SQLgt commitCurrent memory = 10340752Delta memory = -169188Max memory = 10509940Elapsed time= 003 secBuffers = 2048Reads = 1Writes 942Fetches = 3
Fast Report 319
The new version of thefamous Fast Report isnow out FastReportreg 3is an add-in componentthat gives your applica-tions the ability to gener-ate reports quickly andefficiently FastReportregprovides all the tools youneed to develop reportsAll variants of FastRe-portreg 3 contains
Visual report designerwith rulers guides andzooming wizard forbasic types of reportexport filters for html tiffbmp jpg xls pdf out-puts Dot matrix reportssupport support for mostpopular DB-engines FullWYSIWYG text rotation0360 degrees memoobject supports simplehtml-tags (font color b iu sub sup) improvedstretching (StretchModeShiftMode properties)access to DB fieldsstyles text flow URLsAnchors
TrialBuy Now
News amp Events
TECT Softwarepresents
New versions of nice utilitiesfrom TECT Softwarewwwtectsoftnet
SPGen Stored Procedure Gen-erator for Firebird and Inter-
Base creates a standard set ofstored procs for any table full
details can be found here
httpwwwtectsoftnetProductsFirebird
FIBSPGenaspx
And FBMail Firebird EMail(FBMail) is a cross platform PHP
utility which easily allows thesending of emails direct from
within a database
This utility is specifically aimedat ISPs that support Firebirdbut can easily be used on anycomputer which is connected
to the internet
Full detailscan be found here
httpwwwtectsoftnetProductsFirebird
FBMailaspx
News amp Events
This is an excerpt from the one-pass script with NO SAVEPOINT enabled
CopyTiger 100
CopyTigera new productfrom Microtecis an advanced
databasereplication amp
synchronisationtool for
InterBaseFirebird
powered bytheir CopyCatcomponent set(see the articleabout CopyCat
in this issue)
For moreinformation about
CopyTiger seewwwmicrotecfr
copycatct
Downloadan evaluation
version
TestBed
17
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
SQLgt SET TRANSACTION NO SAVEPOINTSQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10352244Delta memory = 55296Max memory = 10352244Elapsed time= 472 secBuffers = 2048Reads = 2102Writes 350Fetches = 967154
SQLgt commitCurrent memory = 10344052Delta memory = -8192Max memory = 10352244Elapsed time= 015 secBuffers = 2048Reads = 1Writes 938Fetches = 3
Performance appears to be almost the same whether the NO SAVEPOINT option is enabled or not
Sequential bulk UPDATE statements
With the mutlti-pass script (sequential UPDATEs) the raw test results are rather large
Table 1
Test results for 5 sequentalUPDATEs
News amp Events
SQLHammer 15Enterprise Edition
The interestingdeveloper toolSQLHammer
now hasan Enterprise Edition
SQLHammer isa high-performance tool
for rapiddatabase development
and administration
It includes a commonintegrated development
environmentregistered custom
componentsbased on the
Borland Package Librarymechanism
and an Open Tools APIwritten in
DelphiObject Pascal
wwwmetadataforgecom
News amp Events
TestBed
18
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
For convenience the results are tabulated above
The first UPDATE statement has almost the same execution time with and without the NOSAVEPOINT option However memory consumption is reduced fivefold when we use NOSAVEPOINT
Figure 1
Time to perform UPDATEs with and without NO SAVEPOINT
Table 1
Test results for 5 sequental UPDATEs
In the second UPDATE we start to see the difference With default transaction settings thisUPDATE takes a very long time - 47 seconds - compared to only 7210 ms with NO SAVE-POINT enabled With default transaction settings we can see that memory usage is signifi-cant wherease with NO SAVEPOINT no additional memory is used
The third and all following UPDATE statements with default settings show equal time andmemory usage values and the growth of writes parameters
Figure 2
Memory usage while performing UPDATEs with and without NO SAVEPOINT
With NO SAVEPOINT usage we observe that timememory values and writes growth areall small and virtually equal for each pass The corresponding graphs are below
Inside the UNDO logSo what happened during the execution of the test scripts What is the secret behind thismagic that NO SAVEPOINT does Is there any cost attached to these gains
A few words about versions
You probably know already that InterBase is a multi-record-version database enginemeaning that each time a record is changed a new version of that record is produced The
TestBed
19
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
old version does not disappearimmediately but is retained as abackversion
In fact the first time a backversion iswritten to disk it is as a delta ver-sion which saves disk space andmemory usage by writing out onlythe differences between the old andthe new versions The engine canrebuild the full old version from thenew version and chains of delta ver-sions It is only if the same record isupdated more than once within thesame transaction that the full back-version is written to disk
The UNDO log concept
You may recall from Dmitrirsquos articlehow each transaction is implicitlyenclosed in a frame of savepointseach having its own undo log Thislog stores a record of each changein sequence ready for the possibili-ty that a rollback will be requested
A backversion materializes when-ever an UPDATE or DELETE state-ment is performed The engine hasto maintain all these backversions inthe undo log for the relevant save-point
So the Undo Log is a mechanism tomanage backversions for save-points in order to enable the associ-ated changes to be rolled back Theprocess of Undo logging is quitecomplex and maintaining it canconsume a lot of resources
The NO SAVEPOINToption
The NO SAVEPOINT option inInterBase 751 is a workaround forthe problem of performance lossduring bulk updates that do multiplepasses of a table The theory is ifusing the implicit savepoint man-agement causes problems then letskill the savepoints No savepoints ndashno problem -)
Besides ISQL it has been surfacedas a transaction parameter in bothDSQL and ESQL At the API level anew transaction parameter block(TPB) option isc_tpb_no_savepointcan be passed in the isc_start_trans-action() function call to disablesavepoints management Syntaxdetails for the latter flavors and forthe new tpb option can be found inthe 751 release notes
The effect of specifying the NOSAVEPOINT transaction parameteris that no undo log will be createdHowever along with the perform-ance gain for sequential bulkupdates it brings some costs fortransaction management
First and most obvious is that withNO SAVEPOINT enabled anyerror handling that relies on save-points is unavailable Any errorduring a NO SAVEPOINT transac-tion precludes all subsequent exe-cution and leads to rollback (see
ldquoRelease Notesrdquo for InterBase 75 SP1 ldquoNew in InterBase 751rdquo page 2-2)
Secondly when a NO SAVEPOINT transaction is rolled back it is marked as rolled back in the transactioninventory page Record version garbage thereby gets stuck in the interesting category and prevents theOIT from advancing Sweep is needed to advance the OIT and back out dead record versions
Fuller details of the NO SAVEPOINT option are provided in the InterBase 751 Release Notes
Initial situation
Consider the implementation details of the undo-log Figure 3 shows the initial situation
Recall that we perform this test on freshly-restored database so it is guaranteed that only one version existsfor any record
Figure 3
Initial situation before any UPDATE - only the one record version exists Undo log is empty
The first UPDATE
The first UPDATE statement creates delta backversions on disk (see figure 4) Since deltas store only thedifferences between the old and new versions they are quite small This operation is fast and it is easywork for the memory manager
It is simple to visualize the undo log when we perform the first UPDATEDELETE statement inside the trans-action ndash the engine just records the numbers of all affected records into the bitmap structure If it needs toroll back the changes associated with this savepoint it can read the stored numbers of the affected recordsthen walk down to the version of each and restore it from the updated version and the backversion storedon disk
TestBed
20
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
Figure 4
UPDATE1 create small delta version on disk and put record number into UNDO log
This approach is very fast and economical on memory usage The enginedoes not waste too many resources to handle this undo log ndash in fact it reusesthe existing multi-versioning mechanism Resource consumption is merelythe memory used to store the bitmap structure with the backversion num-bers We dont see any significant difference here between a transactionwith the default settings and one with the NO SAVEPOINT option enabled
The second UPDATE
When the second UPDATE statement is performed on the same set of records we have adifferent situation
Here is a good place to note that the example we are considering is the most simple situa-tion where only the one global (transaction level) savepoint exists We will also look at thedifference in the Undo log when an explicit (or enclosed BEGINhellip END) savepoint is used
To preserve on-disk integrity (remember the lsquocareful writersquo principle ) the engine mustcompute a new delta between the old version (by transaction1) and new version (by trans-action2 update2) store it somewhere on disk fetch the current full version (by transac-tion2 update1) put it into the in-memory undo-log replace it with the new full version (withbackpointers set to the newly created delta) and erase the old now superseded delta Asyou can see ndash there is much more work to do both on disk and in memory
The engine could write all intermediate versions to disk but there is no reason to do soThese versions are visible only to the modifying transaction and would not be used unlessa rollback was required
Figure 5
The second UPDATE creates a new delta backversion for transaction 1erases from disk the delta version created by the first UPDATE and copiesthe version from UPDATE1 into the Undo log
This all makes hard work for the memory manager and the CPU as you can see from thegrowth of the ldquomax memrdquordquodelta memrdquo parameters values in the test that uses the defaulttransaction parameters
When NO SAVEPOINT is enabled we avoid the expense of maintaining the Undo log Asa result we see execution time readswrites and memory usage as low for subsequentupdates as for the first
The third UPDATE
The third and all subsequent UPDATEs are similar to the second UPDATE with one excep-tion ndash memory usage does not grow any further
TestBed
21
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
Figure 6
The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2version and its own version is written to disk as the latest one
Figure 7
If we have an explicit SAVEPOINT each new record version associated with it will have acorresponding backversion in the Undo log of that savepoint
Original design of IB implement second UPDATE another way but sometime after IB6 Bor-land changed original behavior and we see what we see nowBut this theme is for anotherarticle)
Why is the delta of memory usage zero The reason is that beyond the second UPDATEno record version is created From here on the update just replaces record data on diskwith the newest one and shifts the superseded version into the Undo log
A more interesting question is why we see an increase in disk reads and writes during thetest We would have expected that the third and following UPDATEs would do essentiallyequal numbers of read and writes to write the newest versions and move the previous onesto the undo log However we are actually seeing a growing count of writes We have noanswer for it but we would be pleased to know
The following figure (figure 6) helps to illustrate the situation in the Undo log during thesequential updates When NO SAVEPOINT is enabled the only pieces we need to performare replacing the version on disk and updating the original backversion It is fast as the firstUPDATE
Explicit SAVEPOINTWhen an UPDATE statement is going to be performed within its own explicit or implicit
BEGIN END savepoint framework the engine has to store a backversion for each associ-ated record version in the Undo log
For example if we used an explicit savepoint eg SAVEPOINT Savepoint1 upon perform-ing UPDATE2 we would have the situation illustrated in figure 7
In this case the memory consumption would be expected to increase each time an UPDATEoccurs within the explicit savepoints scope
SummaryThe new transaction option NO SAVEPOINT can solve the problem of excessive resourceusage growth that can occur with sequential bulk updates It should be beneficial whenapplied appropriately Because the option can create its own problems by inhibiting theadvance of the OIT it should be used with caution of course The developer will need to takeextra care about database housekeeping particularly with respect to timely sweeping
Development area
22
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article
The sources in the bibliography arelisted in the order of their appear-ance in my mind
The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability
of all the methods taken from thearticles and of course I have testedall my own methods
Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise
I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc
And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle
The problem state-ment
What is the problem
Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common
1Everyone got used to them andfelt comfortable with them
2They are quite fast (if you usethem according to certain knownstandards)
3They use SQL which is an easycomprehensible and time-proveddata manipulation method
4They are based upon a strongmathematical theory
5They are convenient for applica-
tion development - if you developyour applications just like 20-30years ago
As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and
Author Vladimir Kotlyarevskyvladcontekru
Development area
23
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip
As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored
What do we need
The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing
RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task
The OID
All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it
First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have
any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database
OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex
Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)
Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world
And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones
Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles
The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers
ClassId
All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is
Development area
24
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain
Name Description cre-ation_date change_dateowner
Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect
links is often a very complicatedtask to put it mildly
Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later
The OBJECTS Table
Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure
So let us design this table
Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)
The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)
Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method
Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage
There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student
create domain TOID as integer not nullcreate table OBJECTS(
OID TOID primary keyClassId TOIDName varchar(32)
A simple plain dictionary can be retrieved from the OBJECTStable by the following query
select OID Name Description from OBJECTS where ClassId = LookupId
Development area
25
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description
from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)
Later I will demonstrate how to add alittle more intelligence to such simpledictionaries
Storing of more complexobjects
It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping
Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders
create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))
which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to
ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query
select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id
As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did
If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M
create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))
One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present
Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database
Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this
create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))
connected 1M with OBJECTS by OID There is also a table
create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))
which describes type metadata ndash an attribute set (their names and types) for eachobject type
All attributes for particular object where the OID is known are retrieved by the query
select attribute_id value from attributeswhere OID = oid
or with names of attributes
select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id
Development area
26
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries
select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id
Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query
Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the
database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base
The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1
Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage
Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB
It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-
utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation
To be continuedhellip
FastReport 3 - new generation of the reporting tools
Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix
reports support support most popular DB-engines
Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags
(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)
Access to DB fields styles text flow URLs Anchors
httpwwwfast-reportcom
Development area
27
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
Replicating and synchronizingInterBaseFirebirddatabases using CopyCat
AuthorJonathan Neve
jonathanmicrotecfr
wwwmicrotecfr
PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication
Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases
Data logging
Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)
Multi-node replication
Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after
In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it
replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)
When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it
Two-way replication
One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed
The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change
Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself
Primary key synchronization
One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-
Development area
28
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values
Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server
When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique
Conflict management
Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-
sion of the record this is a conflict
CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved
This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently
Difficult database structures
There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node
Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged
To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an
Development area
29
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable
PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms
1 As a set of Delphi C++ Builder components
2 As a standalone Win32 replication tool
We will now present each one of these products
1 CopyCat Delphi C++ Builder Component Set
CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions
Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server
Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table
Below is a concise guide for getting started with the CopyCat component suite
Download and install the evaluation components fromhttpwwwmicrotecfrcopycat
Prepare databases
1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components
2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)
3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo
4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base
5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo
Development area
30
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data
6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority
7) Apply all the generated SQL to all databases that should be replicated
8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)
Replicate
1 In Delphi open the ldquoReplicatorrdquo example project
2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property
3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases
4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)
5 Compile and run the example
6 Press the ldquoReplicate nowrdquo button
2 CopyTiger the CopyCat Win32standalone replication tool
For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER
Features include
bull Easy to use installer
bull Independent server Administrator tool (CTAdmin)
bull Configuration wizard for setting up links to master slave databases
bull Robust replication engine based on Microtec CopyCat
bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections
bull Simple amp intuitive control panel
bullAutomatic email notification on certain events (conflicts PK violations etc)
Visit the CopyTiger homepage to download a time-limited trial version
httpwwwmicrotecfrcopycatct
SUMMARY
In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit
1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications
2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure
CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool
You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr
As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold
Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code
ibd-852-150
Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems
Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases
Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the
Development area
31
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance
Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets
Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained
database statistics from time to time
Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and
indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports
The summary shown in Figure 1provides general information about
Figure 1 Summary of database statistics
Author Dmitri Kouzmenkokdvib-aidcom
1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction
Development area
32
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases
Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)
As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed
What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay
Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be
stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF
Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure
Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1
Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint
We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base
bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment
bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions
bull The Oldest active3 ndash the oldestcurrently active transaction
bull The Next transaction ndash thetransaction number that will beassigned to new transaction
bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell
if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources
bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset
As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem
It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless
Do not gather statistics if you
bull Just restored your database
bull Performed backup (gbak ndashb dbgdb) without the ndashg switch
bull Recently performed a manual sweep (gfix ndashsweep)
Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)
Development area
33
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw
The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data
For example look at ADDR_ADDRESS_IDX6 First of all the index
How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)
The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most
multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments
Table informationLets take look at another sampleoutput from IBAnalyst
The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics
ly and what the table size is inmegabytes Most of these warningsare customizable
In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records
Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics
were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could
bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway
Development area
34
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20
bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column
That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics
Reports
There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded
As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size
Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-
Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144
ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3
SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance
Readers feedback
35
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Comments to ldquoTemporary tablesrdquo
One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence
Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird
This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience
Then there are situations where theoptimizer simply loses the plot
because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans
Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem
The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp
and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample
The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)
We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is
dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably
Well to be fair Firebird developersare working on both topics
Volker Rehn
volkerrehnbigpondcom
We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it
Alexey Kovyazin Chief Editor
Readers feedback
Miscellaneous
36
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Invitations SubscriptionDonations Sponsorships
This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000
Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance
Subscribe now
To receive future issuesnotifications send email to
subscribeibdevelopercom
MagazineCD
Donations
Server internals
11
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Inside BLOBs
How the serverworks with BLOBs
The BLOB data type is intended forstoring data of variable size Fieldsof BLOB type allow for storage ofdata that cannot be placed in fieldsof other types - for example pic-tures audio files video fragmentsetc
From the point view of the databaseapplication developer using BLOBfields is as transparent as it is forother field types (see chapter ldquoDatatypesrdquo for details) However thereis a significant difference betweenthe internal implementation mecha-nism for BLOBs and that for otherdata
Unlike the mechanism used for han-dling other types of fields the data-base engine uses a special mecha-nism to work with BLOB fields Thismechanism is transparently inte-grated with other record handlingat the application level and at thesame time has its own means ofpage organization Lets consider indetail how the BLOB-handlingmechanism works
Inside BLOBsThis is an excerpt from the book ldquo1000 InterBase amp Firebird Tips amp Tricksrdquoby Alexey Kovyazin and Dmitri Kouzmenko which will be published in 2006
Initially the basic record data onthe data page includes a referenceto a ldquoBLOB recordrdquo for each non-null BLOB field ie to record-likestructure or quasi-record that actu-
ally contains the BLOB dataDepending on the size of the BLOBthis BLOB-record will be one ofthree types
The first type is the simplest If thesize of BLOB-field data is less thanthe free space on the data page it isplaced on the data page as a sepa-rate record of BLOB type
Author Dmitri Kouzmenkokdvib-aidcom
Author Alexey Kovyazinakib-aidcom
Server internals
12
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Inside BLOBs
The second type is used when thesize of BLOB is greater than thefree space on the page In thiscase references to pages contain-ing the actual BLOB data arestored in a quasi-record Thus atwo-level structure of BLOB-fielddata is used
If the size of BLOB-field contents isvery large a three-level structureis used ndash a quasi-record stores ref-erences to BLOB pointer pageswhich contain references to theactual BLOB data
The whole structure of BLOB stor-age (except for the quasi-recordof course) is implemented by onepage type ndash the BLOB page typeDifferent types of BLOB-pages dif-fer from each other in the presenceof a flag (value 0 or 1) defininghow the server should interpret thegiven page
BLOB Page
The blob page consists of the fol-lowing parts
The special header contains thefollowing information
bullThe number of the first blob pagein this blob It is used to check thatpages belong to one blob
bullA sequence number This isimportant in checking the integrityof a BLOB For a BLOB pointerpage it is equal to zero
bullThe length of data on a page Asa page may or may not be filled tothe full extent the length of actualdata is indicated in the header
Maximum BLOB size
As the internal structure for storingBLOB data can have only 3 levelsof organization and the size ofdata page is also limited it is pos-sible to calculate the maximum sizeof a BLOB
However this is a theoretical limit(if you want you can calculate it)but in practice the limit will bemuch lower The reason for thislower limit is that the length ofBLOB-field data is determined by avariable of ULONG type ie itsmaximal size will be equal to 4gigabytes
Moreover in reality this practicallimit is reduced if a UDF is to beused for BLOB processing Aninternal UDF implementationassumes that the maximum BLOB
size will be 2 gigabytes So if youplan to have very large BLOBfields in your database you shouldexperiment with storing data of alarge size beforehand
The segment size mystery
Developers of database applica-tions often ask what the SegmentSize parameter in the definition ofa BLOB is why we need it andwhether or not we should set itwhen creating Blob-fields
In reality there is no need to setthis parameter Actually it is a bitof a relic used by the GPRE utilitywhen pre-processing EmbeddedSQL When working with BLOBsGPRE declares a buffer of speci-fied size based on the segmentsize Setting the segment size hasno influence over the allocationand the size of segments whenstoring the BLOB on disk It alsohas no influence on performanceTherefore the segment size can besafely set to any value but it is setto 80 bytes by default
Information for those who want toknow everything the number 80was chosen because 80 symbolscould be allocated in alphanumer-ic terminals
TestBed
13
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
In issue 1 we published DmitriYemanovrsquos article about the inter-nals of savepoints While that articlewas still on the desk Borlandannounced the release of InterBase751 introducing amongst otherthings a NO SAVEPOINT optionfor transaction management Is thisan important improvement for Inter-Base We decided to give thisimplementation a close look and testit some to discover what it is allabout
Testing NO SAVEPOINT
In order to analyze the problem thatthe new transaction option wasintended to address and to assessits real value we performed severalvery simple SQL tests The tests areall 100 reproducible so you willbe able to verify our results easily
Database for testing
The test database file was created inInterBase 751 page size = 4096character encoding is NONE Itcontains two tables one stored pro-cedure and three generators
For the test we will use only one
Testing the NO SAVEPOINTfeature in InterBase 751 Author Alexey Kovyazin
akib-aidcom
Author Vlad Horsunhvladuserssourceforgenet
table with the following structure
CREATE TABLE TEST (ID NUMERIC(182)NAME VARCHAR(120)DESCRIPTION VARCHAR(250)CNT INTEGERQRT DOUBLE PRECISIONTS_CHANGE TIMESTAMPTS_CREATE TIMESTAMPNOTES BLOB
)This table contains 100000 records which will be updated during the testThe stored procedure and generators are used to fill the table with test dataYou can increase the quantity of records in the test table by calling thestored procedure to insert them
SELECT FROM INSERTRECS(1000)
The second table TEST2DROP has the same structure as the first and is filledwith the same records as TESTINSERT INTO TEST2DROP SELECT FROM TESTAs you will see the second table will be dropped immediately after connectWe are just using it as a way to increase database size cheaply the pagesoccupied by the TEST2DROP table will be released for reuse after we dropthe table With this trick we avoid the impact of database file growth on thetest results
Setting test environment
All that is needed to perform this test is the trial installation package of Inter-Base 751 the test database and an SQL script
Download the InterBase 751 trialversion from wwwborlandcomThe installation process is obviousand well-described in the InterBasedocumentation
You can download a backup of thetest database ready to use fromhttpwwwibdevelopercomissue2testspbackupzip (~4 Mb) oralternatively an SQL script for cre-ating it fromhttpwwwibdevelopercomissue2testspdatabasescriptzip (~1Kb)
If you download the databasebackup the test tables are alreadypopulated with records and youcan proceed straight to the sectionldquoPreparing to testrdquo below
If you choose instead to use theSQL script you will create data-base yourself Make sure youinsert 100000 records into tableTEST using the INSERTRECS storedprocedure and then copy all ofthem to TEST2DROP three or fourtimes
After that perform a backup of thisdatabase and you will be on thesame position as if you had down-
TestBed
14
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
loaded ldquoready for userdquo backup
Hardware is not a material issue for these tests since we are only comparing performance with and without the NOSAVEPOINT option Our test platform was a modest computer with Pentium-4 2 GHz with 512 RAM and an 80GBSamsung HDD
Preparing to test
A separate copy of the test database is used for each test case in order to eliminate any interference between state-ments We create four fresh copies of the database for this purpose Supposing all files are in a directory calledCTEST simply create the four test databases from your test backup file
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp1ib
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp2ib
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp3ib
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp4ib
SQL test scripts
The first script tests a regular one-pass update without the NO SAVEPOINT option
For convenience the important commands are clarified with comments
connect CTESTtestsp1ib USER SYSDBA Password masterkey Connect drop table TEST2DROP Drop table TEST2DROP to free database pagescommitselect count() from test Walk down all records in TEST to place them into cachecommitset time on enable time statistics for performed statementsset stat on enables writesfetchesmemory statisticscommit perform bulk update of all records in TEST table update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommit
quit The second script tests performance for the same UPDATE with the NO SAVEPOINT option
TestBed
15
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
connect Ctestsp2ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitSET TRANSACTION NO SAVEPOINT enable NO SAVEPOINTupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit
Except for the inclusion of the SET TRANSACTION NO SAVEPOINT statement in the second script both scripts arethe same simply testing the behavior of engine in case of the single bulk UPDATE
To test sequential UPDATEs we added several UPDATE statements--we recommend using five The script for testing without NO SAVEPOINT would be
connect Etestsp3ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit
Fyracle 089
Janus has released anew version of Oracle-mode Firebird FyracleFyracle is a specialized
build of Firebird 15it adds temporary
tables hierarchicalqueries and
a PLSQL engine
Version 089 addssupport for storedprocedures written
in Java
Fyracle dramaticallyreduces the cost of port-ing Oracle-based appli-
cations to Firebird
Common usageincludes the
Compiere open sourceERP package
mid-market deploymentsof Developer
2000 applicationsand demo CDsof applications
without license trouble
Read more at
wwwjanus-soft-warecom
News amp Events
IBAnalyst 19
IBSurgeon has issued newversion of IBAnalyst
Now it can better analyzeInterBase or Firebird data-base statistics with usingmetadata information (in thiscase connection to databaseis required)
IBAnalyst is a tool that assistsa user to analyze in detailFirebird or InterBase data-base statistics and identifypossible problems with data-base performance mainte-nance and how an applica-tion interacts with the data-base
It graphically displays data-base statistics and can thenautomatically make intelli-gent suggestions aboutimproving database per-formance and databasemaintenance
wwwibsurgeoncomnewshtml
News amp Events
You can download all the scripts and the raw results of their execution from this locationhttpwwwibdevelopercomissue2testresultszip
TestBed
16
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
How to perform the test
The easiest way to perform the test is to use isqls INPUT command
Suppose you have the scripts located in ctestscripts
gtisqlUse CONNECT or CREATE DATABASE to specify a databaseSQLgtinput ctestscriptsscript01sql
Test resultsThe single bulk UPDATE
First letrsquos perform the test where the single-pass bulk UPDATE is performed
This is an excerpt from the one-pass script with default transaction settings
SQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10509940Delta memory = 214016Max memory = 10509940Elapsed time= 463 secBuffers = 2048Reads = 2111Writes 375Fetches = 974980
SQLgt commitCurrent memory = 10340752Delta memory = -169188Max memory = 10509940Elapsed time= 003 secBuffers = 2048Reads = 1Writes 942Fetches = 3
Fast Report 319
The new version of thefamous Fast Report isnow out FastReportreg 3is an add-in componentthat gives your applica-tions the ability to gener-ate reports quickly andefficiently FastReportregprovides all the tools youneed to develop reportsAll variants of FastRe-portreg 3 contains
Visual report designerwith rulers guides andzooming wizard forbasic types of reportexport filters for html tiffbmp jpg xls pdf out-puts Dot matrix reportssupport support for mostpopular DB-engines FullWYSIWYG text rotation0360 degrees memoobject supports simplehtml-tags (font color b iu sub sup) improvedstretching (StretchModeShiftMode properties)access to DB fieldsstyles text flow URLsAnchors
TrialBuy Now
News amp Events
TECT Softwarepresents
New versions of nice utilitiesfrom TECT Softwarewwwtectsoftnet
SPGen Stored Procedure Gen-erator for Firebird and Inter-
Base creates a standard set ofstored procs for any table full
details can be found here
httpwwwtectsoftnetProductsFirebird
FIBSPGenaspx
And FBMail Firebird EMail(FBMail) is a cross platform PHP
utility which easily allows thesending of emails direct from
within a database
This utility is specifically aimedat ISPs that support Firebirdbut can easily be used on anycomputer which is connected
to the internet
Full detailscan be found here
httpwwwtectsoftnetProductsFirebird
FBMailaspx
News amp Events
This is an excerpt from the one-pass script with NO SAVEPOINT enabled
CopyTiger 100
CopyTigera new productfrom Microtecis an advanced
databasereplication amp
synchronisationtool for
InterBaseFirebird
powered bytheir CopyCatcomponent set(see the articleabout CopyCat
in this issue)
For moreinformation about
CopyTiger seewwwmicrotecfr
copycatct
Downloadan evaluation
version
TestBed
17
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
SQLgt SET TRANSACTION NO SAVEPOINTSQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10352244Delta memory = 55296Max memory = 10352244Elapsed time= 472 secBuffers = 2048Reads = 2102Writes 350Fetches = 967154
SQLgt commitCurrent memory = 10344052Delta memory = -8192Max memory = 10352244Elapsed time= 015 secBuffers = 2048Reads = 1Writes 938Fetches = 3
Performance appears to be almost the same whether the NO SAVEPOINT option is enabled or not
Sequential bulk UPDATE statements
With the mutlti-pass script (sequential UPDATEs) the raw test results are rather large
Table 1
Test results for 5 sequentalUPDATEs
News amp Events
SQLHammer 15Enterprise Edition
The interestingdeveloper toolSQLHammer
now hasan Enterprise Edition
SQLHammer isa high-performance tool
for rapiddatabase development
and administration
It includes a commonintegrated development
environmentregistered custom
componentsbased on the
Borland Package Librarymechanism
and an Open Tools APIwritten in
DelphiObject Pascal
wwwmetadataforgecom
News amp Events
TestBed
18
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
For convenience the results are tabulated above
The first UPDATE statement has almost the same execution time with and without the NOSAVEPOINT option However memory consumption is reduced fivefold when we use NOSAVEPOINT
Figure 1
Time to perform UPDATEs with and without NO SAVEPOINT
Table 1
Test results for 5 sequental UPDATEs
In the second UPDATE we start to see the difference With default transaction settings thisUPDATE takes a very long time - 47 seconds - compared to only 7210 ms with NO SAVE-POINT enabled With default transaction settings we can see that memory usage is signifi-cant wherease with NO SAVEPOINT no additional memory is used
The third and all following UPDATE statements with default settings show equal time andmemory usage values and the growth of writes parameters
Figure 2
Memory usage while performing UPDATEs with and without NO SAVEPOINT
With NO SAVEPOINT usage we observe that timememory values and writes growth areall small and virtually equal for each pass The corresponding graphs are below
Inside the UNDO logSo what happened during the execution of the test scripts What is the secret behind thismagic that NO SAVEPOINT does Is there any cost attached to these gains
A few words about versions
You probably know already that InterBase is a multi-record-version database enginemeaning that each time a record is changed a new version of that record is produced The
TestBed
19
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
old version does not disappearimmediately but is retained as abackversion
In fact the first time a backversion iswritten to disk it is as a delta ver-sion which saves disk space andmemory usage by writing out onlythe differences between the old andthe new versions The engine canrebuild the full old version from thenew version and chains of delta ver-sions It is only if the same record isupdated more than once within thesame transaction that the full back-version is written to disk
The UNDO log concept
You may recall from Dmitrirsquos articlehow each transaction is implicitlyenclosed in a frame of savepointseach having its own undo log Thislog stores a record of each changein sequence ready for the possibili-ty that a rollback will be requested
A backversion materializes when-ever an UPDATE or DELETE state-ment is performed The engine hasto maintain all these backversions inthe undo log for the relevant save-point
So the Undo Log is a mechanism tomanage backversions for save-points in order to enable the associ-ated changes to be rolled back Theprocess of Undo logging is quitecomplex and maintaining it canconsume a lot of resources
The NO SAVEPOINToption
The NO SAVEPOINT option inInterBase 751 is a workaround forthe problem of performance lossduring bulk updates that do multiplepasses of a table The theory is ifusing the implicit savepoint man-agement causes problems then letskill the savepoints No savepoints ndashno problem -)
Besides ISQL it has been surfacedas a transaction parameter in bothDSQL and ESQL At the API level anew transaction parameter block(TPB) option isc_tpb_no_savepointcan be passed in the isc_start_trans-action() function call to disablesavepoints management Syntaxdetails for the latter flavors and forthe new tpb option can be found inthe 751 release notes
The effect of specifying the NOSAVEPOINT transaction parameteris that no undo log will be createdHowever along with the perform-ance gain for sequential bulkupdates it brings some costs fortransaction management
First and most obvious is that withNO SAVEPOINT enabled anyerror handling that relies on save-points is unavailable Any errorduring a NO SAVEPOINT transac-tion precludes all subsequent exe-cution and leads to rollback (see
ldquoRelease Notesrdquo for InterBase 75 SP1 ldquoNew in InterBase 751rdquo page 2-2)
Secondly when a NO SAVEPOINT transaction is rolled back it is marked as rolled back in the transactioninventory page Record version garbage thereby gets stuck in the interesting category and prevents theOIT from advancing Sweep is needed to advance the OIT and back out dead record versions
Fuller details of the NO SAVEPOINT option are provided in the InterBase 751 Release Notes
Initial situation
Consider the implementation details of the undo-log Figure 3 shows the initial situation
Recall that we perform this test on freshly-restored database so it is guaranteed that only one version existsfor any record
Figure 3
Initial situation before any UPDATE - only the one record version exists Undo log is empty
The first UPDATE
The first UPDATE statement creates delta backversions on disk (see figure 4) Since deltas store only thedifferences between the old and new versions they are quite small This operation is fast and it is easywork for the memory manager
It is simple to visualize the undo log when we perform the first UPDATEDELETE statement inside the trans-action ndash the engine just records the numbers of all affected records into the bitmap structure If it needs toroll back the changes associated with this savepoint it can read the stored numbers of the affected recordsthen walk down to the version of each and restore it from the updated version and the backversion storedon disk
TestBed
20
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
Figure 4
UPDATE1 create small delta version on disk and put record number into UNDO log
This approach is very fast and economical on memory usage The enginedoes not waste too many resources to handle this undo log ndash in fact it reusesthe existing multi-versioning mechanism Resource consumption is merelythe memory used to store the bitmap structure with the backversion num-bers We dont see any significant difference here between a transactionwith the default settings and one with the NO SAVEPOINT option enabled
The second UPDATE
When the second UPDATE statement is performed on the same set of records we have adifferent situation
Here is a good place to note that the example we are considering is the most simple situa-tion where only the one global (transaction level) savepoint exists We will also look at thedifference in the Undo log when an explicit (or enclosed BEGINhellip END) savepoint is used
To preserve on-disk integrity (remember the lsquocareful writersquo principle ) the engine mustcompute a new delta between the old version (by transaction1) and new version (by trans-action2 update2) store it somewhere on disk fetch the current full version (by transac-tion2 update1) put it into the in-memory undo-log replace it with the new full version (withbackpointers set to the newly created delta) and erase the old now superseded delta Asyou can see ndash there is much more work to do both on disk and in memory
The engine could write all intermediate versions to disk but there is no reason to do soThese versions are visible only to the modifying transaction and would not be used unlessa rollback was required
Figure 5
The second UPDATE creates a new delta backversion for transaction 1erases from disk the delta version created by the first UPDATE and copiesthe version from UPDATE1 into the Undo log
This all makes hard work for the memory manager and the CPU as you can see from thegrowth of the ldquomax memrdquordquodelta memrdquo parameters values in the test that uses the defaulttransaction parameters
When NO SAVEPOINT is enabled we avoid the expense of maintaining the Undo log Asa result we see execution time readswrites and memory usage as low for subsequentupdates as for the first
The third UPDATE
The third and all subsequent UPDATEs are similar to the second UPDATE with one excep-tion ndash memory usage does not grow any further
TestBed
21
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
Figure 6
The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2version and its own version is written to disk as the latest one
Figure 7
If we have an explicit SAVEPOINT each new record version associated with it will have acorresponding backversion in the Undo log of that savepoint
Original design of IB implement second UPDATE another way but sometime after IB6 Bor-land changed original behavior and we see what we see nowBut this theme is for anotherarticle)
Why is the delta of memory usage zero The reason is that beyond the second UPDATEno record version is created From here on the update just replaces record data on diskwith the newest one and shifts the superseded version into the Undo log
A more interesting question is why we see an increase in disk reads and writes during thetest We would have expected that the third and following UPDATEs would do essentiallyequal numbers of read and writes to write the newest versions and move the previous onesto the undo log However we are actually seeing a growing count of writes We have noanswer for it but we would be pleased to know
The following figure (figure 6) helps to illustrate the situation in the Undo log during thesequential updates When NO SAVEPOINT is enabled the only pieces we need to performare replacing the version on disk and updating the original backversion It is fast as the firstUPDATE
Explicit SAVEPOINTWhen an UPDATE statement is going to be performed within its own explicit or implicit
BEGIN END savepoint framework the engine has to store a backversion for each associ-ated record version in the Undo log
For example if we used an explicit savepoint eg SAVEPOINT Savepoint1 upon perform-ing UPDATE2 we would have the situation illustrated in figure 7
In this case the memory consumption would be expected to increase each time an UPDATEoccurs within the explicit savepoints scope
SummaryThe new transaction option NO SAVEPOINT can solve the problem of excessive resourceusage growth that can occur with sequential bulk updates It should be beneficial whenapplied appropriately Because the option can create its own problems by inhibiting theadvance of the OIT it should be used with caution of course The developer will need to takeextra care about database housekeeping particularly with respect to timely sweeping
Development area
22
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article
The sources in the bibliography arelisted in the order of their appear-ance in my mind
The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability
of all the methods taken from thearticles and of course I have testedall my own methods
Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise
I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc
And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle
The problem state-ment
What is the problem
Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common
1Everyone got used to them andfelt comfortable with them
2They are quite fast (if you usethem according to certain knownstandards)
3They use SQL which is an easycomprehensible and time-proveddata manipulation method
4They are based upon a strongmathematical theory
5They are convenient for applica-
tion development - if you developyour applications just like 20-30years ago
As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and
Author Vladimir Kotlyarevskyvladcontekru
Development area
23
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip
As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored
What do we need
The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing
RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task
The OID
All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it
First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have
any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database
OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex
Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)
Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world
And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones
Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles
The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers
ClassId
All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is
Development area
24
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain
Name Description cre-ation_date change_dateowner
Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect
links is often a very complicatedtask to put it mildly
Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later
The OBJECTS Table
Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure
So let us design this table
Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)
The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)
Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method
Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage
There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student
create domain TOID as integer not nullcreate table OBJECTS(
OID TOID primary keyClassId TOIDName varchar(32)
A simple plain dictionary can be retrieved from the OBJECTStable by the following query
select OID Name Description from OBJECTS where ClassId = LookupId
Development area
25
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description
from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)
Later I will demonstrate how to add alittle more intelligence to such simpledictionaries
Storing of more complexobjects
It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping
Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders
create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))
which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to
ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query
select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id
As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did
If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M
create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))
One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present
Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database
Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this
create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))
connected 1M with OBJECTS by OID There is also a table
create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))
which describes type metadata ndash an attribute set (their names and types) for eachobject type
All attributes for particular object where the OID is known are retrieved by the query
select attribute_id value from attributeswhere OID = oid
or with names of attributes
select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id
Development area
26
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries
select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id
Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query
Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the
database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base
The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1
Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage
Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB
It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-
utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation
To be continuedhellip
FastReport 3 - new generation of the reporting tools
Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix
reports support support most popular DB-engines
Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags
(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)
Access to DB fields styles text flow URLs Anchors
httpwwwfast-reportcom
Development area
27
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
Replicating and synchronizingInterBaseFirebirddatabases using CopyCat
AuthorJonathan Neve
jonathanmicrotecfr
wwwmicrotecfr
PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication
Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases
Data logging
Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)
Multi-node replication
Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after
In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it
replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)
When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it
Two-way replication
One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed
The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change
Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself
Primary key synchronization
One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-
Development area
28
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values
Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server
When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique
Conflict management
Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-
sion of the record this is a conflict
CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved
This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently
Difficult database structures
There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node
Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged
To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an
Development area
29
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable
PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms
1 As a set of Delphi C++ Builder components
2 As a standalone Win32 replication tool
We will now present each one of these products
1 CopyCat Delphi C++ Builder Component Set
CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions
Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server
Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table
Below is a concise guide for getting started with the CopyCat component suite
Download and install the evaluation components fromhttpwwwmicrotecfrcopycat
Prepare databases
1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components
2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)
3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo
4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base
5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo
Development area
30
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data
6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority
7) Apply all the generated SQL to all databases that should be replicated
8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)
Replicate
1 In Delphi open the ldquoReplicatorrdquo example project
2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property
3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases
4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)
5 Compile and run the example
6 Press the ldquoReplicate nowrdquo button
2 CopyTiger the CopyCat Win32standalone replication tool
For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER
Features include
bull Easy to use installer
bull Independent server Administrator tool (CTAdmin)
bull Configuration wizard for setting up links to master slave databases
bull Robust replication engine based on Microtec CopyCat
bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections
bull Simple amp intuitive control panel
bullAutomatic email notification on certain events (conflicts PK violations etc)
Visit the CopyTiger homepage to download a time-limited trial version
httpwwwmicrotecfrcopycatct
SUMMARY
In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit
1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications
2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure
CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool
You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr
As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold
Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code
ibd-852-150
Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems
Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases
Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the
Development area
31
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance
Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets
Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained
database statistics from time to time
Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and
indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports
The summary shown in Figure 1provides general information about
Figure 1 Summary of database statistics
Author Dmitri Kouzmenkokdvib-aidcom
1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction
Development area
32
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases
Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)
As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed
What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay
Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be
stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF
Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure
Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1
Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint
We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base
bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment
bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions
bull The Oldest active3 ndash the oldestcurrently active transaction
bull The Next transaction ndash thetransaction number that will beassigned to new transaction
bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell
if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources
bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset
As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem
It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless
Do not gather statistics if you
bull Just restored your database
bull Performed backup (gbak ndashb dbgdb) without the ndashg switch
bull Recently performed a manual sweep (gfix ndashsweep)
Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)
Development area
33
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw
The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data
For example look at ADDR_ADDRESS_IDX6 First of all the index
How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)
The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most
multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments
Table informationLets take look at another sampleoutput from IBAnalyst
The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics
ly and what the table size is inmegabytes Most of these warningsare customizable
In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records
Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics
were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could
bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway
Development area
34
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20
bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column
That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics
Reports
There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded
As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size
Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-
Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144
ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3
SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance
Readers feedback
35
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Comments to ldquoTemporary tablesrdquo
One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence
Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird
This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience
Then there are situations where theoptimizer simply loses the plot
because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans
Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem
The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp
and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample
The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)
We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is
dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably
Well to be fair Firebird developersare working on both topics
Volker Rehn
volkerrehnbigpondcom
We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it
Alexey Kovyazin Chief Editor
Readers feedback
Miscellaneous
36
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Invitations SubscriptionDonations Sponsorships
This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000
Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance
Subscribe now
To receive future issuesnotifications send email to
subscribeibdevelopercom
MagazineCD
Donations
Server internals
12
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Inside BLOBs
The second type is used when thesize of BLOB is greater than thefree space on the page In thiscase references to pages contain-ing the actual BLOB data arestored in a quasi-record Thus atwo-level structure of BLOB-fielddata is used
If the size of BLOB-field contents isvery large a three-level structureis used ndash a quasi-record stores ref-erences to BLOB pointer pageswhich contain references to theactual BLOB data
The whole structure of BLOB stor-age (except for the quasi-recordof course) is implemented by onepage type ndash the BLOB page typeDifferent types of BLOB-pages dif-fer from each other in the presenceof a flag (value 0 or 1) defininghow the server should interpret thegiven page
BLOB Page
The blob page consists of the fol-lowing parts
The special header contains thefollowing information
bullThe number of the first blob pagein this blob It is used to check thatpages belong to one blob
bullA sequence number This isimportant in checking the integrityof a BLOB For a BLOB pointerpage it is equal to zero
bullThe length of data on a page Asa page may or may not be filled tothe full extent the length of actualdata is indicated in the header
Maximum BLOB size
As the internal structure for storingBLOB data can have only 3 levelsof organization and the size ofdata page is also limited it is pos-sible to calculate the maximum sizeof a BLOB
However this is a theoretical limit(if you want you can calculate it)but in practice the limit will bemuch lower The reason for thislower limit is that the length ofBLOB-field data is determined by avariable of ULONG type ie itsmaximal size will be equal to 4gigabytes
Moreover in reality this practicallimit is reduced if a UDF is to beused for BLOB processing Aninternal UDF implementationassumes that the maximum BLOB
size will be 2 gigabytes So if youplan to have very large BLOBfields in your database you shouldexperiment with storing data of alarge size beforehand
The segment size mystery
Developers of database applica-tions often ask what the SegmentSize parameter in the definition ofa BLOB is why we need it andwhether or not we should set itwhen creating Blob-fields
In reality there is no need to setthis parameter Actually it is a bitof a relic used by the GPRE utilitywhen pre-processing EmbeddedSQL When working with BLOBsGPRE declares a buffer of speci-fied size based on the segmentsize Setting the segment size hasno influence over the allocationand the size of segments whenstoring the BLOB on disk It alsohas no influence on performanceTherefore the segment size can besafely set to any value but it is setto 80 bytes by default
Information for those who want toknow everything the number 80was chosen because 80 symbolscould be allocated in alphanumer-ic terminals
TestBed
13
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
In issue 1 we published DmitriYemanovrsquos article about the inter-nals of savepoints While that articlewas still on the desk Borlandannounced the release of InterBase751 introducing amongst otherthings a NO SAVEPOINT optionfor transaction management Is thisan important improvement for Inter-Base We decided to give thisimplementation a close look and testit some to discover what it is allabout
Testing NO SAVEPOINT
In order to analyze the problem thatthe new transaction option wasintended to address and to assessits real value we performed severalvery simple SQL tests The tests areall 100 reproducible so you willbe able to verify our results easily
Database for testing
The test database file was created inInterBase 751 page size = 4096character encoding is NONE Itcontains two tables one stored pro-cedure and three generators
For the test we will use only one
Testing the NO SAVEPOINTfeature in InterBase 751 Author Alexey Kovyazin
akib-aidcom
Author Vlad Horsunhvladuserssourceforgenet
table with the following structure
CREATE TABLE TEST (ID NUMERIC(182)NAME VARCHAR(120)DESCRIPTION VARCHAR(250)CNT INTEGERQRT DOUBLE PRECISIONTS_CHANGE TIMESTAMPTS_CREATE TIMESTAMPNOTES BLOB
)This table contains 100000 records which will be updated during the testThe stored procedure and generators are used to fill the table with test dataYou can increase the quantity of records in the test table by calling thestored procedure to insert them
SELECT FROM INSERTRECS(1000)
The second table TEST2DROP has the same structure as the first and is filledwith the same records as TESTINSERT INTO TEST2DROP SELECT FROM TESTAs you will see the second table will be dropped immediately after connectWe are just using it as a way to increase database size cheaply the pagesoccupied by the TEST2DROP table will be released for reuse after we dropthe table With this trick we avoid the impact of database file growth on thetest results
Setting test environment
All that is needed to perform this test is the trial installation package of Inter-Base 751 the test database and an SQL script
Download the InterBase 751 trialversion from wwwborlandcomThe installation process is obviousand well-described in the InterBasedocumentation
You can download a backup of thetest database ready to use fromhttpwwwibdevelopercomissue2testspbackupzip (~4 Mb) oralternatively an SQL script for cre-ating it fromhttpwwwibdevelopercomissue2testspdatabasescriptzip (~1Kb)
If you download the databasebackup the test tables are alreadypopulated with records and youcan proceed straight to the sectionldquoPreparing to testrdquo below
If you choose instead to use theSQL script you will create data-base yourself Make sure youinsert 100000 records into tableTEST using the INSERTRECS storedprocedure and then copy all ofthem to TEST2DROP three or fourtimes
After that perform a backup of thisdatabase and you will be on thesame position as if you had down-
TestBed
14
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
loaded ldquoready for userdquo backup
Hardware is not a material issue for these tests since we are only comparing performance with and without the NOSAVEPOINT option Our test platform was a modest computer with Pentium-4 2 GHz with 512 RAM and an 80GBSamsung HDD
Preparing to test
A separate copy of the test database is used for each test case in order to eliminate any interference between state-ments We create four fresh copies of the database for this purpose Supposing all files are in a directory calledCTEST simply create the four test databases from your test backup file
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp1ib
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp2ib
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp3ib
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp4ib
SQL test scripts
The first script tests a regular one-pass update without the NO SAVEPOINT option
For convenience the important commands are clarified with comments
connect CTESTtestsp1ib USER SYSDBA Password masterkey Connect drop table TEST2DROP Drop table TEST2DROP to free database pagescommitselect count() from test Walk down all records in TEST to place them into cachecommitset time on enable time statistics for performed statementsset stat on enables writesfetchesmemory statisticscommit perform bulk update of all records in TEST table update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommit
quit The second script tests performance for the same UPDATE with the NO SAVEPOINT option
TestBed
15
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
connect Ctestsp2ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitSET TRANSACTION NO SAVEPOINT enable NO SAVEPOINTupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit
Except for the inclusion of the SET TRANSACTION NO SAVEPOINT statement in the second script both scripts arethe same simply testing the behavior of engine in case of the single bulk UPDATE
To test sequential UPDATEs we added several UPDATE statements--we recommend using five The script for testing without NO SAVEPOINT would be
connect Etestsp3ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit
Fyracle 089
Janus has released anew version of Oracle-mode Firebird FyracleFyracle is a specialized
build of Firebird 15it adds temporary
tables hierarchicalqueries and
a PLSQL engine
Version 089 addssupport for storedprocedures written
in Java
Fyracle dramaticallyreduces the cost of port-ing Oracle-based appli-
cations to Firebird
Common usageincludes the
Compiere open sourceERP package
mid-market deploymentsof Developer
2000 applicationsand demo CDsof applications
without license trouble
Read more at
wwwjanus-soft-warecom
News amp Events
IBAnalyst 19
IBSurgeon has issued newversion of IBAnalyst
Now it can better analyzeInterBase or Firebird data-base statistics with usingmetadata information (in thiscase connection to databaseis required)
IBAnalyst is a tool that assistsa user to analyze in detailFirebird or InterBase data-base statistics and identifypossible problems with data-base performance mainte-nance and how an applica-tion interacts with the data-base
It graphically displays data-base statistics and can thenautomatically make intelli-gent suggestions aboutimproving database per-formance and databasemaintenance
wwwibsurgeoncomnewshtml
News amp Events
You can download all the scripts and the raw results of their execution from this locationhttpwwwibdevelopercomissue2testresultszip
TestBed
16
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
How to perform the test
The easiest way to perform the test is to use isqls INPUT command
Suppose you have the scripts located in ctestscripts
gtisqlUse CONNECT or CREATE DATABASE to specify a databaseSQLgtinput ctestscriptsscript01sql
Test resultsThe single bulk UPDATE
First letrsquos perform the test where the single-pass bulk UPDATE is performed
This is an excerpt from the one-pass script with default transaction settings
SQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10509940Delta memory = 214016Max memory = 10509940Elapsed time= 463 secBuffers = 2048Reads = 2111Writes 375Fetches = 974980
SQLgt commitCurrent memory = 10340752Delta memory = -169188Max memory = 10509940Elapsed time= 003 secBuffers = 2048Reads = 1Writes 942Fetches = 3
Fast Report 319
The new version of thefamous Fast Report isnow out FastReportreg 3is an add-in componentthat gives your applica-tions the ability to gener-ate reports quickly andefficiently FastReportregprovides all the tools youneed to develop reportsAll variants of FastRe-portreg 3 contains
Visual report designerwith rulers guides andzooming wizard forbasic types of reportexport filters for html tiffbmp jpg xls pdf out-puts Dot matrix reportssupport support for mostpopular DB-engines FullWYSIWYG text rotation0360 degrees memoobject supports simplehtml-tags (font color b iu sub sup) improvedstretching (StretchModeShiftMode properties)access to DB fieldsstyles text flow URLsAnchors
TrialBuy Now
News amp Events
TECT Softwarepresents
New versions of nice utilitiesfrom TECT Softwarewwwtectsoftnet
SPGen Stored Procedure Gen-erator for Firebird and Inter-
Base creates a standard set ofstored procs for any table full
details can be found here
httpwwwtectsoftnetProductsFirebird
FIBSPGenaspx
And FBMail Firebird EMail(FBMail) is a cross platform PHP
utility which easily allows thesending of emails direct from
within a database
This utility is specifically aimedat ISPs that support Firebirdbut can easily be used on anycomputer which is connected
to the internet
Full detailscan be found here
httpwwwtectsoftnetProductsFirebird
FBMailaspx
News amp Events
This is an excerpt from the one-pass script with NO SAVEPOINT enabled
CopyTiger 100
CopyTigera new productfrom Microtecis an advanced
databasereplication amp
synchronisationtool for
InterBaseFirebird
powered bytheir CopyCatcomponent set(see the articleabout CopyCat
in this issue)
For moreinformation about
CopyTiger seewwwmicrotecfr
copycatct
Downloadan evaluation
version
TestBed
17
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
SQLgt SET TRANSACTION NO SAVEPOINTSQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10352244Delta memory = 55296Max memory = 10352244Elapsed time= 472 secBuffers = 2048Reads = 2102Writes 350Fetches = 967154
SQLgt commitCurrent memory = 10344052Delta memory = -8192Max memory = 10352244Elapsed time= 015 secBuffers = 2048Reads = 1Writes 938Fetches = 3
Performance appears to be almost the same whether the NO SAVEPOINT option is enabled or not
Sequential bulk UPDATE statements
With the mutlti-pass script (sequential UPDATEs) the raw test results are rather large
Table 1
Test results for 5 sequentalUPDATEs
News amp Events
SQLHammer 15Enterprise Edition
The interestingdeveloper toolSQLHammer
now hasan Enterprise Edition
SQLHammer isa high-performance tool
for rapiddatabase development
and administration
It includes a commonintegrated development
environmentregistered custom
componentsbased on the
Borland Package Librarymechanism
and an Open Tools APIwritten in
DelphiObject Pascal
wwwmetadataforgecom
News amp Events
TestBed
18
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
For convenience the results are tabulated above
The first UPDATE statement has almost the same execution time with and without the NOSAVEPOINT option However memory consumption is reduced fivefold when we use NOSAVEPOINT
Figure 1
Time to perform UPDATEs with and without NO SAVEPOINT
Table 1
Test results for 5 sequental UPDATEs
In the second UPDATE we start to see the difference With default transaction settings thisUPDATE takes a very long time - 47 seconds - compared to only 7210 ms with NO SAVE-POINT enabled With default transaction settings we can see that memory usage is signifi-cant wherease with NO SAVEPOINT no additional memory is used
The third and all following UPDATE statements with default settings show equal time andmemory usage values and the growth of writes parameters
Figure 2
Memory usage while performing UPDATEs with and without NO SAVEPOINT
With NO SAVEPOINT usage we observe that timememory values and writes growth areall small and virtually equal for each pass The corresponding graphs are below
Inside the UNDO logSo what happened during the execution of the test scripts What is the secret behind thismagic that NO SAVEPOINT does Is there any cost attached to these gains
A few words about versions
You probably know already that InterBase is a multi-record-version database enginemeaning that each time a record is changed a new version of that record is produced The
TestBed
19
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
old version does not disappearimmediately but is retained as abackversion
In fact the first time a backversion iswritten to disk it is as a delta ver-sion which saves disk space andmemory usage by writing out onlythe differences between the old andthe new versions The engine canrebuild the full old version from thenew version and chains of delta ver-sions It is only if the same record isupdated more than once within thesame transaction that the full back-version is written to disk
The UNDO log concept
You may recall from Dmitrirsquos articlehow each transaction is implicitlyenclosed in a frame of savepointseach having its own undo log Thislog stores a record of each changein sequence ready for the possibili-ty that a rollback will be requested
A backversion materializes when-ever an UPDATE or DELETE state-ment is performed The engine hasto maintain all these backversions inthe undo log for the relevant save-point
So the Undo Log is a mechanism tomanage backversions for save-points in order to enable the associ-ated changes to be rolled back Theprocess of Undo logging is quitecomplex and maintaining it canconsume a lot of resources
The NO SAVEPOINToption
The NO SAVEPOINT option inInterBase 751 is a workaround forthe problem of performance lossduring bulk updates that do multiplepasses of a table The theory is ifusing the implicit savepoint man-agement causes problems then letskill the savepoints No savepoints ndashno problem -)
Besides ISQL it has been surfacedas a transaction parameter in bothDSQL and ESQL At the API level anew transaction parameter block(TPB) option isc_tpb_no_savepointcan be passed in the isc_start_trans-action() function call to disablesavepoints management Syntaxdetails for the latter flavors and forthe new tpb option can be found inthe 751 release notes
The effect of specifying the NOSAVEPOINT transaction parameteris that no undo log will be createdHowever along with the perform-ance gain for sequential bulkupdates it brings some costs fortransaction management
First and most obvious is that withNO SAVEPOINT enabled anyerror handling that relies on save-points is unavailable Any errorduring a NO SAVEPOINT transac-tion precludes all subsequent exe-cution and leads to rollback (see
ldquoRelease Notesrdquo for InterBase 75 SP1 ldquoNew in InterBase 751rdquo page 2-2)
Secondly when a NO SAVEPOINT transaction is rolled back it is marked as rolled back in the transactioninventory page Record version garbage thereby gets stuck in the interesting category and prevents theOIT from advancing Sweep is needed to advance the OIT and back out dead record versions
Fuller details of the NO SAVEPOINT option are provided in the InterBase 751 Release Notes
Initial situation
Consider the implementation details of the undo-log Figure 3 shows the initial situation
Recall that we perform this test on freshly-restored database so it is guaranteed that only one version existsfor any record
Figure 3
Initial situation before any UPDATE - only the one record version exists Undo log is empty
The first UPDATE
The first UPDATE statement creates delta backversions on disk (see figure 4) Since deltas store only thedifferences between the old and new versions they are quite small This operation is fast and it is easywork for the memory manager
It is simple to visualize the undo log when we perform the first UPDATEDELETE statement inside the trans-action ndash the engine just records the numbers of all affected records into the bitmap structure If it needs toroll back the changes associated with this savepoint it can read the stored numbers of the affected recordsthen walk down to the version of each and restore it from the updated version and the backversion storedon disk
TestBed
20
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
Figure 4
UPDATE1 create small delta version on disk and put record number into UNDO log
This approach is very fast and economical on memory usage The enginedoes not waste too many resources to handle this undo log ndash in fact it reusesthe existing multi-versioning mechanism Resource consumption is merelythe memory used to store the bitmap structure with the backversion num-bers We dont see any significant difference here between a transactionwith the default settings and one with the NO SAVEPOINT option enabled
The second UPDATE
When the second UPDATE statement is performed on the same set of records we have adifferent situation
Here is a good place to note that the example we are considering is the most simple situa-tion where only the one global (transaction level) savepoint exists We will also look at thedifference in the Undo log when an explicit (or enclosed BEGINhellip END) savepoint is used
To preserve on-disk integrity (remember the lsquocareful writersquo principle ) the engine mustcompute a new delta between the old version (by transaction1) and new version (by trans-action2 update2) store it somewhere on disk fetch the current full version (by transac-tion2 update1) put it into the in-memory undo-log replace it with the new full version (withbackpointers set to the newly created delta) and erase the old now superseded delta Asyou can see ndash there is much more work to do both on disk and in memory
The engine could write all intermediate versions to disk but there is no reason to do soThese versions are visible only to the modifying transaction and would not be used unlessa rollback was required
Figure 5
The second UPDATE creates a new delta backversion for transaction 1erases from disk the delta version created by the first UPDATE and copiesthe version from UPDATE1 into the Undo log
This all makes hard work for the memory manager and the CPU as you can see from thegrowth of the ldquomax memrdquordquodelta memrdquo parameters values in the test that uses the defaulttransaction parameters
When NO SAVEPOINT is enabled we avoid the expense of maintaining the Undo log Asa result we see execution time readswrites and memory usage as low for subsequentupdates as for the first
The third UPDATE
The third and all subsequent UPDATEs are similar to the second UPDATE with one excep-tion ndash memory usage does not grow any further
TestBed
21
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
Figure 6
The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2version and its own version is written to disk as the latest one
Figure 7
If we have an explicit SAVEPOINT each new record version associated with it will have acorresponding backversion in the Undo log of that savepoint
Original design of IB implement second UPDATE another way but sometime after IB6 Bor-land changed original behavior and we see what we see nowBut this theme is for anotherarticle)
Why is the delta of memory usage zero The reason is that beyond the second UPDATEno record version is created From here on the update just replaces record data on diskwith the newest one and shifts the superseded version into the Undo log
A more interesting question is why we see an increase in disk reads and writes during thetest We would have expected that the third and following UPDATEs would do essentiallyequal numbers of read and writes to write the newest versions and move the previous onesto the undo log However we are actually seeing a growing count of writes We have noanswer for it but we would be pleased to know
The following figure (figure 6) helps to illustrate the situation in the Undo log during thesequential updates When NO SAVEPOINT is enabled the only pieces we need to performare replacing the version on disk and updating the original backversion It is fast as the firstUPDATE
Explicit SAVEPOINTWhen an UPDATE statement is going to be performed within its own explicit or implicit
BEGIN END savepoint framework the engine has to store a backversion for each associ-ated record version in the Undo log
For example if we used an explicit savepoint eg SAVEPOINT Savepoint1 upon perform-ing UPDATE2 we would have the situation illustrated in figure 7
In this case the memory consumption would be expected to increase each time an UPDATEoccurs within the explicit savepoints scope
SummaryThe new transaction option NO SAVEPOINT can solve the problem of excessive resourceusage growth that can occur with sequential bulk updates It should be beneficial whenapplied appropriately Because the option can create its own problems by inhibiting theadvance of the OIT it should be used with caution of course The developer will need to takeextra care about database housekeeping particularly with respect to timely sweeping
Development area
22
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article
The sources in the bibliography arelisted in the order of their appear-ance in my mind
The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability
of all the methods taken from thearticles and of course I have testedall my own methods
Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise
I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc
And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle
The problem state-ment
What is the problem
Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common
1Everyone got used to them andfelt comfortable with them
2They are quite fast (if you usethem according to certain knownstandards)
3They use SQL which is an easycomprehensible and time-proveddata manipulation method
4They are based upon a strongmathematical theory
5They are convenient for applica-
tion development - if you developyour applications just like 20-30years ago
As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and
Author Vladimir Kotlyarevskyvladcontekru
Development area
23
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip
As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored
What do we need
The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing
RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task
The OID
All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it
First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have
any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database
OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex
Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)
Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world
And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones
Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles
The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers
ClassId
All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is
Development area
24
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain
Name Description cre-ation_date change_dateowner
Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect
links is often a very complicatedtask to put it mildly
Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later
The OBJECTS Table
Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure
So let us design this table
Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)
The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)
Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method
Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage
There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student
create domain TOID as integer not nullcreate table OBJECTS(
OID TOID primary keyClassId TOIDName varchar(32)
A simple plain dictionary can be retrieved from the OBJECTStable by the following query
select OID Name Description from OBJECTS where ClassId = LookupId
Development area
25
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description
from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)
Later I will demonstrate how to add alittle more intelligence to such simpledictionaries
Storing of more complexobjects
It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping
Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders
create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))
which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to
ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query
select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id
As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did
If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M
create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))
One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present
Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database
Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this
create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))
connected 1M with OBJECTS by OID There is also a table
create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))
which describes type metadata ndash an attribute set (their names and types) for eachobject type
All attributes for particular object where the OID is known are retrieved by the query
select attribute_id value from attributeswhere OID = oid
or with names of attributes
select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id
Development area
26
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries
select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id
Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query
Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the
database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base
The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1
Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage
Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB
It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-
utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation
To be continuedhellip
FastReport 3 - new generation of the reporting tools
Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix
reports support support most popular DB-engines
Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags
(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)
Access to DB fields styles text flow URLs Anchors
httpwwwfast-reportcom
Development area
27
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
Replicating and synchronizingInterBaseFirebirddatabases using CopyCat
AuthorJonathan Neve
jonathanmicrotecfr
wwwmicrotecfr
PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication
Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases
Data logging
Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)
Multi-node replication
Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after
In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it
replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)
When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it
Two-way replication
One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed
The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change
Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself
Primary key synchronization
One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-
Development area
28
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values
Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server
When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique
Conflict management
Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-
sion of the record this is a conflict
CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved
This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently
Difficult database structures
There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node
Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged
To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an
Development area
29
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable
PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms
1 As a set of Delphi C++ Builder components
2 As a standalone Win32 replication tool
We will now present each one of these products
1 CopyCat Delphi C++ Builder Component Set
CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions
Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server
Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table
Below is a concise guide for getting started with the CopyCat component suite
Download and install the evaluation components fromhttpwwwmicrotecfrcopycat
Prepare databases
1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components
2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)
3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo
4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base
5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo
Development area
30
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data
6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority
7) Apply all the generated SQL to all databases that should be replicated
8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)
Replicate
1 In Delphi open the ldquoReplicatorrdquo example project
2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property
3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases
4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)
5 Compile and run the example
6 Press the ldquoReplicate nowrdquo button
2 CopyTiger the CopyCat Win32standalone replication tool
For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER
Features include
bull Easy to use installer
bull Independent server Administrator tool (CTAdmin)
bull Configuration wizard for setting up links to master slave databases
bull Robust replication engine based on Microtec CopyCat
bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections
bull Simple amp intuitive control panel
bullAutomatic email notification on certain events (conflicts PK violations etc)
Visit the CopyTiger homepage to download a time-limited trial version
httpwwwmicrotecfrcopycatct
SUMMARY
In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit
1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications
2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure
CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool
You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr
As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold
Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code
ibd-852-150
Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems
Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases
Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the
Development area
31
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance
Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets
Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained
database statistics from time to time
Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and
indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports
The summary shown in Figure 1provides general information about
Figure 1 Summary of database statistics
Author Dmitri Kouzmenkokdvib-aidcom
1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction
Development area
32
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases
Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)
As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed
What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay
Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be
stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF
Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure
Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1
Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint
We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base
bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment
bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions
bull The Oldest active3 ndash the oldestcurrently active transaction
bull The Next transaction ndash thetransaction number that will beassigned to new transaction
bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell
if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources
bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset
As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem
It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless
Do not gather statistics if you
bull Just restored your database
bull Performed backup (gbak ndashb dbgdb) without the ndashg switch
bull Recently performed a manual sweep (gfix ndashsweep)
Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)
Development area
33
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw
The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data
For example look at ADDR_ADDRESS_IDX6 First of all the index
How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)
The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most
multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments
Table informationLets take look at another sampleoutput from IBAnalyst
The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics
ly and what the table size is inmegabytes Most of these warningsare customizable
In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records
Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics
were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could
bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway
Development area
34
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20
bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column
That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics
Reports
There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded
As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size
Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-
Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144
ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3
SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance
Readers feedback
35
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Comments to ldquoTemporary tablesrdquo
One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence
Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird
This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience
Then there are situations where theoptimizer simply loses the plot
because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans
Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem
The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp
and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample
The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)
We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is
dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably
Well to be fair Firebird developersare working on both topics
Volker Rehn
volkerrehnbigpondcom
We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it
Alexey Kovyazin Chief Editor
Readers feedback
Miscellaneous
36
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Invitations SubscriptionDonations Sponsorships
This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000
Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance
Subscribe now
To receive future issuesnotifications send email to
subscribeibdevelopercom
MagazineCD
Donations
TestBed
13
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
In issue 1 we published DmitriYemanovrsquos article about the inter-nals of savepoints While that articlewas still on the desk Borlandannounced the release of InterBase751 introducing amongst otherthings a NO SAVEPOINT optionfor transaction management Is thisan important improvement for Inter-Base We decided to give thisimplementation a close look and testit some to discover what it is allabout
Testing NO SAVEPOINT
In order to analyze the problem thatthe new transaction option wasintended to address and to assessits real value we performed severalvery simple SQL tests The tests areall 100 reproducible so you willbe able to verify our results easily
Database for testing
The test database file was created inInterBase 751 page size = 4096character encoding is NONE Itcontains two tables one stored pro-cedure and three generators
For the test we will use only one
Testing the NO SAVEPOINTfeature in InterBase 751 Author Alexey Kovyazin
akib-aidcom
Author Vlad Horsunhvladuserssourceforgenet
table with the following structure
CREATE TABLE TEST (ID NUMERIC(182)NAME VARCHAR(120)DESCRIPTION VARCHAR(250)CNT INTEGERQRT DOUBLE PRECISIONTS_CHANGE TIMESTAMPTS_CREATE TIMESTAMPNOTES BLOB
)This table contains 100000 records which will be updated during the testThe stored procedure and generators are used to fill the table with test dataYou can increase the quantity of records in the test table by calling thestored procedure to insert them
SELECT FROM INSERTRECS(1000)
The second table TEST2DROP has the same structure as the first and is filledwith the same records as TESTINSERT INTO TEST2DROP SELECT FROM TESTAs you will see the second table will be dropped immediately after connectWe are just using it as a way to increase database size cheaply the pagesoccupied by the TEST2DROP table will be released for reuse after we dropthe table With this trick we avoid the impact of database file growth on thetest results
Setting test environment
All that is needed to perform this test is the trial installation package of Inter-Base 751 the test database and an SQL script
Download the InterBase 751 trialversion from wwwborlandcomThe installation process is obviousand well-described in the InterBasedocumentation
You can download a backup of thetest database ready to use fromhttpwwwibdevelopercomissue2testspbackupzip (~4 Mb) oralternatively an SQL script for cre-ating it fromhttpwwwibdevelopercomissue2testspdatabasescriptzip (~1Kb)
If you download the databasebackup the test tables are alreadypopulated with records and youcan proceed straight to the sectionldquoPreparing to testrdquo below
If you choose instead to use theSQL script you will create data-base yourself Make sure youinsert 100000 records into tableTEST using the INSERTRECS storedprocedure and then copy all ofthem to TEST2DROP three or fourtimes
After that perform a backup of thisdatabase and you will be on thesame position as if you had down-
TestBed
14
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
loaded ldquoready for userdquo backup
Hardware is not a material issue for these tests since we are only comparing performance with and without the NOSAVEPOINT option Our test platform was a modest computer with Pentium-4 2 GHz with 512 RAM and an 80GBSamsung HDD
Preparing to test
A separate copy of the test database is used for each test case in order to eliminate any interference between state-ments We create four fresh copies of the database for this purpose Supposing all files are in a directory calledCTEST simply create the four test databases from your test backup file
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp1ib
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp2ib
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp3ib
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp4ib
SQL test scripts
The first script tests a regular one-pass update without the NO SAVEPOINT option
For convenience the important commands are clarified with comments
connect CTESTtestsp1ib USER SYSDBA Password masterkey Connect drop table TEST2DROP Drop table TEST2DROP to free database pagescommitselect count() from test Walk down all records in TEST to place them into cachecommitset time on enable time statistics for performed statementsset stat on enables writesfetchesmemory statisticscommit perform bulk update of all records in TEST table update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommit
quit The second script tests performance for the same UPDATE with the NO SAVEPOINT option
TestBed
15
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
connect Ctestsp2ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitSET TRANSACTION NO SAVEPOINT enable NO SAVEPOINTupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit
Except for the inclusion of the SET TRANSACTION NO SAVEPOINT statement in the second script both scripts arethe same simply testing the behavior of engine in case of the single bulk UPDATE
To test sequential UPDATEs we added several UPDATE statements--we recommend using five The script for testing without NO SAVEPOINT would be
connect Etestsp3ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit
Fyracle 089
Janus has released anew version of Oracle-mode Firebird FyracleFyracle is a specialized
build of Firebird 15it adds temporary
tables hierarchicalqueries and
a PLSQL engine
Version 089 addssupport for storedprocedures written
in Java
Fyracle dramaticallyreduces the cost of port-ing Oracle-based appli-
cations to Firebird
Common usageincludes the
Compiere open sourceERP package
mid-market deploymentsof Developer
2000 applicationsand demo CDsof applications
without license trouble
Read more at
wwwjanus-soft-warecom
News amp Events
IBAnalyst 19
IBSurgeon has issued newversion of IBAnalyst
Now it can better analyzeInterBase or Firebird data-base statistics with usingmetadata information (in thiscase connection to databaseis required)
IBAnalyst is a tool that assistsa user to analyze in detailFirebird or InterBase data-base statistics and identifypossible problems with data-base performance mainte-nance and how an applica-tion interacts with the data-base
It graphically displays data-base statistics and can thenautomatically make intelli-gent suggestions aboutimproving database per-formance and databasemaintenance
wwwibsurgeoncomnewshtml
News amp Events
You can download all the scripts and the raw results of their execution from this locationhttpwwwibdevelopercomissue2testresultszip
TestBed
16
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
How to perform the test
The easiest way to perform the test is to use isqls INPUT command
Suppose you have the scripts located in ctestscripts
gtisqlUse CONNECT or CREATE DATABASE to specify a databaseSQLgtinput ctestscriptsscript01sql
Test resultsThe single bulk UPDATE
First letrsquos perform the test where the single-pass bulk UPDATE is performed
This is an excerpt from the one-pass script with default transaction settings
SQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10509940Delta memory = 214016Max memory = 10509940Elapsed time= 463 secBuffers = 2048Reads = 2111Writes 375Fetches = 974980
SQLgt commitCurrent memory = 10340752Delta memory = -169188Max memory = 10509940Elapsed time= 003 secBuffers = 2048Reads = 1Writes 942Fetches = 3
Fast Report 319
The new version of thefamous Fast Report isnow out FastReportreg 3is an add-in componentthat gives your applica-tions the ability to gener-ate reports quickly andefficiently FastReportregprovides all the tools youneed to develop reportsAll variants of FastRe-portreg 3 contains
Visual report designerwith rulers guides andzooming wizard forbasic types of reportexport filters for html tiffbmp jpg xls pdf out-puts Dot matrix reportssupport support for mostpopular DB-engines FullWYSIWYG text rotation0360 degrees memoobject supports simplehtml-tags (font color b iu sub sup) improvedstretching (StretchModeShiftMode properties)access to DB fieldsstyles text flow URLsAnchors
TrialBuy Now
News amp Events
TECT Softwarepresents
New versions of nice utilitiesfrom TECT Softwarewwwtectsoftnet
SPGen Stored Procedure Gen-erator for Firebird and Inter-
Base creates a standard set ofstored procs for any table full
details can be found here
httpwwwtectsoftnetProductsFirebird
FIBSPGenaspx
And FBMail Firebird EMail(FBMail) is a cross platform PHP
utility which easily allows thesending of emails direct from
within a database
This utility is specifically aimedat ISPs that support Firebirdbut can easily be used on anycomputer which is connected
to the internet
Full detailscan be found here
httpwwwtectsoftnetProductsFirebird
FBMailaspx
News amp Events
This is an excerpt from the one-pass script with NO SAVEPOINT enabled
CopyTiger 100
CopyTigera new productfrom Microtecis an advanced
databasereplication amp
synchronisationtool for
InterBaseFirebird
powered bytheir CopyCatcomponent set(see the articleabout CopyCat
in this issue)
For moreinformation about
CopyTiger seewwwmicrotecfr
copycatct
Downloadan evaluation
version
TestBed
17
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
SQLgt SET TRANSACTION NO SAVEPOINTSQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10352244Delta memory = 55296Max memory = 10352244Elapsed time= 472 secBuffers = 2048Reads = 2102Writes 350Fetches = 967154
SQLgt commitCurrent memory = 10344052Delta memory = -8192Max memory = 10352244Elapsed time= 015 secBuffers = 2048Reads = 1Writes 938Fetches = 3
Performance appears to be almost the same whether the NO SAVEPOINT option is enabled or not
Sequential bulk UPDATE statements
With the mutlti-pass script (sequential UPDATEs) the raw test results are rather large
Table 1
Test results for 5 sequentalUPDATEs
News amp Events
SQLHammer 15Enterprise Edition
The interestingdeveloper toolSQLHammer
now hasan Enterprise Edition
SQLHammer isa high-performance tool
for rapiddatabase development
and administration
It includes a commonintegrated development
environmentregistered custom
componentsbased on the
Borland Package Librarymechanism
and an Open Tools APIwritten in
DelphiObject Pascal
wwwmetadataforgecom
News amp Events
TestBed
18
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
For convenience the results are tabulated above
The first UPDATE statement has almost the same execution time with and without the NOSAVEPOINT option However memory consumption is reduced fivefold when we use NOSAVEPOINT
Figure 1
Time to perform UPDATEs with and without NO SAVEPOINT
Table 1
Test results for 5 sequental UPDATEs
In the second UPDATE we start to see the difference With default transaction settings thisUPDATE takes a very long time - 47 seconds - compared to only 7210 ms with NO SAVE-POINT enabled With default transaction settings we can see that memory usage is signifi-cant wherease with NO SAVEPOINT no additional memory is used
The third and all following UPDATE statements with default settings show equal time andmemory usage values and the growth of writes parameters
Figure 2
Memory usage while performing UPDATEs with and without NO SAVEPOINT
With NO SAVEPOINT usage we observe that timememory values and writes growth areall small and virtually equal for each pass The corresponding graphs are below
Inside the UNDO logSo what happened during the execution of the test scripts What is the secret behind thismagic that NO SAVEPOINT does Is there any cost attached to these gains
A few words about versions
You probably know already that InterBase is a multi-record-version database enginemeaning that each time a record is changed a new version of that record is produced The
TestBed
19
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
old version does not disappearimmediately but is retained as abackversion
In fact the first time a backversion iswritten to disk it is as a delta ver-sion which saves disk space andmemory usage by writing out onlythe differences between the old andthe new versions The engine canrebuild the full old version from thenew version and chains of delta ver-sions It is only if the same record isupdated more than once within thesame transaction that the full back-version is written to disk
The UNDO log concept
You may recall from Dmitrirsquos articlehow each transaction is implicitlyenclosed in a frame of savepointseach having its own undo log Thislog stores a record of each changein sequence ready for the possibili-ty that a rollback will be requested
A backversion materializes when-ever an UPDATE or DELETE state-ment is performed The engine hasto maintain all these backversions inthe undo log for the relevant save-point
So the Undo Log is a mechanism tomanage backversions for save-points in order to enable the associ-ated changes to be rolled back Theprocess of Undo logging is quitecomplex and maintaining it canconsume a lot of resources
The NO SAVEPOINToption
The NO SAVEPOINT option inInterBase 751 is a workaround forthe problem of performance lossduring bulk updates that do multiplepasses of a table The theory is ifusing the implicit savepoint man-agement causes problems then letskill the savepoints No savepoints ndashno problem -)
Besides ISQL it has been surfacedas a transaction parameter in bothDSQL and ESQL At the API level anew transaction parameter block(TPB) option isc_tpb_no_savepointcan be passed in the isc_start_trans-action() function call to disablesavepoints management Syntaxdetails for the latter flavors and forthe new tpb option can be found inthe 751 release notes
The effect of specifying the NOSAVEPOINT transaction parameteris that no undo log will be createdHowever along with the perform-ance gain for sequential bulkupdates it brings some costs fortransaction management
First and most obvious is that withNO SAVEPOINT enabled anyerror handling that relies on save-points is unavailable Any errorduring a NO SAVEPOINT transac-tion precludes all subsequent exe-cution and leads to rollback (see
ldquoRelease Notesrdquo for InterBase 75 SP1 ldquoNew in InterBase 751rdquo page 2-2)
Secondly when a NO SAVEPOINT transaction is rolled back it is marked as rolled back in the transactioninventory page Record version garbage thereby gets stuck in the interesting category and prevents theOIT from advancing Sweep is needed to advance the OIT and back out dead record versions
Fuller details of the NO SAVEPOINT option are provided in the InterBase 751 Release Notes
Initial situation
Consider the implementation details of the undo-log Figure 3 shows the initial situation
Recall that we perform this test on freshly-restored database so it is guaranteed that only one version existsfor any record
Figure 3
Initial situation before any UPDATE - only the one record version exists Undo log is empty
The first UPDATE
The first UPDATE statement creates delta backversions on disk (see figure 4) Since deltas store only thedifferences between the old and new versions they are quite small This operation is fast and it is easywork for the memory manager
It is simple to visualize the undo log when we perform the first UPDATEDELETE statement inside the trans-action ndash the engine just records the numbers of all affected records into the bitmap structure If it needs toroll back the changes associated with this savepoint it can read the stored numbers of the affected recordsthen walk down to the version of each and restore it from the updated version and the backversion storedon disk
TestBed
20
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
Figure 4
UPDATE1 create small delta version on disk and put record number into UNDO log
This approach is very fast and economical on memory usage The enginedoes not waste too many resources to handle this undo log ndash in fact it reusesthe existing multi-versioning mechanism Resource consumption is merelythe memory used to store the bitmap structure with the backversion num-bers We dont see any significant difference here between a transactionwith the default settings and one with the NO SAVEPOINT option enabled
The second UPDATE
When the second UPDATE statement is performed on the same set of records we have adifferent situation
Here is a good place to note that the example we are considering is the most simple situa-tion where only the one global (transaction level) savepoint exists We will also look at thedifference in the Undo log when an explicit (or enclosed BEGINhellip END) savepoint is used
To preserve on-disk integrity (remember the lsquocareful writersquo principle ) the engine mustcompute a new delta between the old version (by transaction1) and new version (by trans-action2 update2) store it somewhere on disk fetch the current full version (by transac-tion2 update1) put it into the in-memory undo-log replace it with the new full version (withbackpointers set to the newly created delta) and erase the old now superseded delta Asyou can see ndash there is much more work to do both on disk and in memory
The engine could write all intermediate versions to disk but there is no reason to do soThese versions are visible only to the modifying transaction and would not be used unlessa rollback was required
Figure 5
The second UPDATE creates a new delta backversion for transaction 1erases from disk the delta version created by the first UPDATE and copiesthe version from UPDATE1 into the Undo log
This all makes hard work for the memory manager and the CPU as you can see from thegrowth of the ldquomax memrdquordquodelta memrdquo parameters values in the test that uses the defaulttransaction parameters
When NO SAVEPOINT is enabled we avoid the expense of maintaining the Undo log Asa result we see execution time readswrites and memory usage as low for subsequentupdates as for the first
The third UPDATE
The third and all subsequent UPDATEs are similar to the second UPDATE with one excep-tion ndash memory usage does not grow any further
TestBed
21
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
Figure 6
The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2version and its own version is written to disk as the latest one
Figure 7
If we have an explicit SAVEPOINT each new record version associated with it will have acorresponding backversion in the Undo log of that savepoint
Original design of IB implement second UPDATE another way but sometime after IB6 Bor-land changed original behavior and we see what we see nowBut this theme is for anotherarticle)
Why is the delta of memory usage zero The reason is that beyond the second UPDATEno record version is created From here on the update just replaces record data on diskwith the newest one and shifts the superseded version into the Undo log
A more interesting question is why we see an increase in disk reads and writes during thetest We would have expected that the third and following UPDATEs would do essentiallyequal numbers of read and writes to write the newest versions and move the previous onesto the undo log However we are actually seeing a growing count of writes We have noanswer for it but we would be pleased to know
The following figure (figure 6) helps to illustrate the situation in the Undo log during thesequential updates When NO SAVEPOINT is enabled the only pieces we need to performare replacing the version on disk and updating the original backversion It is fast as the firstUPDATE
Explicit SAVEPOINTWhen an UPDATE statement is going to be performed within its own explicit or implicit
BEGIN END savepoint framework the engine has to store a backversion for each associ-ated record version in the Undo log
For example if we used an explicit savepoint eg SAVEPOINT Savepoint1 upon perform-ing UPDATE2 we would have the situation illustrated in figure 7
In this case the memory consumption would be expected to increase each time an UPDATEoccurs within the explicit savepoints scope
SummaryThe new transaction option NO SAVEPOINT can solve the problem of excessive resourceusage growth that can occur with sequential bulk updates It should be beneficial whenapplied appropriately Because the option can create its own problems by inhibiting theadvance of the OIT it should be used with caution of course The developer will need to takeextra care about database housekeeping particularly with respect to timely sweeping
Development area
22
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article
The sources in the bibliography arelisted in the order of their appear-ance in my mind
The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability
of all the methods taken from thearticles and of course I have testedall my own methods
Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise
I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc
And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle
The problem state-ment
What is the problem
Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common
1Everyone got used to them andfelt comfortable with them
2They are quite fast (if you usethem according to certain knownstandards)
3They use SQL which is an easycomprehensible and time-proveddata manipulation method
4They are based upon a strongmathematical theory
5They are convenient for applica-
tion development - if you developyour applications just like 20-30years ago
As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and
Author Vladimir Kotlyarevskyvladcontekru
Development area
23
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip
As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored
What do we need
The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing
RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task
The OID
All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it
First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have
any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database
OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex
Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)
Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world
And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones
Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles
The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers
ClassId
All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is
Development area
24
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain
Name Description cre-ation_date change_dateowner
Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect
links is often a very complicatedtask to put it mildly
Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later
The OBJECTS Table
Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure
So let us design this table
Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)
The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)
Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method
Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage
There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student
create domain TOID as integer not nullcreate table OBJECTS(
OID TOID primary keyClassId TOIDName varchar(32)
A simple plain dictionary can be retrieved from the OBJECTStable by the following query
select OID Name Description from OBJECTS where ClassId = LookupId
Development area
25
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description
from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)
Later I will demonstrate how to add alittle more intelligence to such simpledictionaries
Storing of more complexobjects
It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping
Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders
create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))
which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to
ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query
select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id
As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did
If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M
create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))
One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present
Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database
Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this
create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))
connected 1M with OBJECTS by OID There is also a table
create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))
which describes type metadata ndash an attribute set (their names and types) for eachobject type
All attributes for particular object where the OID is known are retrieved by the query
select attribute_id value from attributeswhere OID = oid
or with names of attributes
select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id
Development area
26
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries
select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id
Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query
Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the
database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base
The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1
Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage
Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB
It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-
utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation
To be continuedhellip
FastReport 3 - new generation of the reporting tools
Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix
reports support support most popular DB-engines
Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags
(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)
Access to DB fields styles text flow URLs Anchors
httpwwwfast-reportcom
Development area
27
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
Replicating and synchronizingInterBaseFirebirddatabases using CopyCat
AuthorJonathan Neve
jonathanmicrotecfr
wwwmicrotecfr
PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication
Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases
Data logging
Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)
Multi-node replication
Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after
In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it
replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)
When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it
Two-way replication
One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed
The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change
Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself
Primary key synchronization
One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-
Development area
28
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values
Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server
When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique
Conflict management
Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-
sion of the record this is a conflict
CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved
This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently
Difficult database structures
There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node
Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged
To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an
Development area
29
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable
PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms
1 As a set of Delphi C++ Builder components
2 As a standalone Win32 replication tool
We will now present each one of these products
1 CopyCat Delphi C++ Builder Component Set
CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions
Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server
Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table
Below is a concise guide for getting started with the CopyCat component suite
Download and install the evaluation components fromhttpwwwmicrotecfrcopycat
Prepare databases
1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components
2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)
3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo
4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base
5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo
Development area
30
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data
6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority
7) Apply all the generated SQL to all databases that should be replicated
8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)
Replicate
1 In Delphi open the ldquoReplicatorrdquo example project
2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property
3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases
4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)
5 Compile and run the example
6 Press the ldquoReplicate nowrdquo button
2 CopyTiger the CopyCat Win32standalone replication tool
For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER
Features include
bull Easy to use installer
bull Independent server Administrator tool (CTAdmin)
bull Configuration wizard for setting up links to master slave databases
bull Robust replication engine based on Microtec CopyCat
bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections
bull Simple amp intuitive control panel
bullAutomatic email notification on certain events (conflicts PK violations etc)
Visit the CopyTiger homepage to download a time-limited trial version
httpwwwmicrotecfrcopycatct
SUMMARY
In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit
1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications
2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure
CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool
You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr
As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold
Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code
ibd-852-150
Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems
Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases
Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the
Development area
31
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance
Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets
Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained
database statistics from time to time
Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and
indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports
The summary shown in Figure 1provides general information about
Figure 1 Summary of database statistics
Author Dmitri Kouzmenkokdvib-aidcom
1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction
Development area
32
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases
Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)
As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed
What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay
Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be
stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF
Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure
Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1
Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint
We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base
bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment
bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions
bull The Oldest active3 ndash the oldestcurrently active transaction
bull The Next transaction ndash thetransaction number that will beassigned to new transaction
bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell
if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources
bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset
As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem
It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless
Do not gather statistics if you
bull Just restored your database
bull Performed backup (gbak ndashb dbgdb) without the ndashg switch
bull Recently performed a manual sweep (gfix ndashsweep)
Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)
Development area
33
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw
The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data
For example look at ADDR_ADDRESS_IDX6 First of all the index
How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)
The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most
multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments
Table informationLets take look at another sampleoutput from IBAnalyst
The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics
ly and what the table size is inmegabytes Most of these warningsare customizable
In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records
Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics
were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could
bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway
Development area
34
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20
bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column
That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics
Reports
There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded
As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size
Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-
Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144
ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3
SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance
Readers feedback
35
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Comments to ldquoTemporary tablesrdquo
One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence
Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird
This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience
Then there are situations where theoptimizer simply loses the plot
because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans
Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem
The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp
and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample
The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)
We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is
dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably
Well to be fair Firebird developersare working on both topics
Volker Rehn
volkerrehnbigpondcom
We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it
Alexey Kovyazin Chief Editor
Readers feedback
Miscellaneous
36
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Invitations SubscriptionDonations Sponsorships
This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000
Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance
Subscribe now
To receive future issuesnotifications send email to
subscribeibdevelopercom
MagazineCD
Donations
TestBed
14
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
loaded ldquoready for userdquo backup
Hardware is not a material issue for these tests since we are only comparing performance with and without the NOSAVEPOINT option Our test platform was a modest computer with Pentium-4 2 GHz with 512 RAM and an 80GBSamsung HDD
Preparing to test
A separate copy of the test database is used for each test case in order to eliminate any interference between state-ments We create four fresh copies of the database for this purpose Supposing all files are in a directory calledCTEST simply create the four test databases from your test backup file
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp1ib
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp2ib
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp3ib
gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp4ib
SQL test scripts
The first script tests a regular one-pass update without the NO SAVEPOINT option
For convenience the important commands are clarified with comments
connect CTESTtestsp1ib USER SYSDBA Password masterkey Connect drop table TEST2DROP Drop table TEST2DROP to free database pagescommitselect count() from test Walk down all records in TEST to place them into cachecommitset time on enable time statistics for performed statementsset stat on enables writesfetchesmemory statisticscommit perform bulk update of all records in TEST table update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommit
quit The second script tests performance for the same UPDATE with the NO SAVEPOINT option
TestBed
15
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
connect Ctestsp2ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitSET TRANSACTION NO SAVEPOINT enable NO SAVEPOINTupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit
Except for the inclusion of the SET TRANSACTION NO SAVEPOINT statement in the second script both scripts arethe same simply testing the behavior of engine in case of the single bulk UPDATE
To test sequential UPDATEs we added several UPDATE statements--we recommend using five The script for testing without NO SAVEPOINT would be
connect Etestsp3ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit
Fyracle 089
Janus has released anew version of Oracle-mode Firebird FyracleFyracle is a specialized
build of Firebird 15it adds temporary
tables hierarchicalqueries and
a PLSQL engine
Version 089 addssupport for storedprocedures written
in Java
Fyracle dramaticallyreduces the cost of port-ing Oracle-based appli-
cations to Firebird
Common usageincludes the
Compiere open sourceERP package
mid-market deploymentsof Developer
2000 applicationsand demo CDsof applications
without license trouble
Read more at
wwwjanus-soft-warecom
News amp Events
IBAnalyst 19
IBSurgeon has issued newversion of IBAnalyst
Now it can better analyzeInterBase or Firebird data-base statistics with usingmetadata information (in thiscase connection to databaseis required)
IBAnalyst is a tool that assistsa user to analyze in detailFirebird or InterBase data-base statistics and identifypossible problems with data-base performance mainte-nance and how an applica-tion interacts with the data-base
It graphically displays data-base statistics and can thenautomatically make intelli-gent suggestions aboutimproving database per-formance and databasemaintenance
wwwibsurgeoncomnewshtml
News amp Events
You can download all the scripts and the raw results of their execution from this locationhttpwwwibdevelopercomissue2testresultszip
TestBed
16
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
How to perform the test
The easiest way to perform the test is to use isqls INPUT command
Suppose you have the scripts located in ctestscripts
gtisqlUse CONNECT or CREATE DATABASE to specify a databaseSQLgtinput ctestscriptsscript01sql
Test resultsThe single bulk UPDATE
First letrsquos perform the test where the single-pass bulk UPDATE is performed
This is an excerpt from the one-pass script with default transaction settings
SQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10509940Delta memory = 214016Max memory = 10509940Elapsed time= 463 secBuffers = 2048Reads = 2111Writes 375Fetches = 974980
SQLgt commitCurrent memory = 10340752Delta memory = -169188Max memory = 10509940Elapsed time= 003 secBuffers = 2048Reads = 1Writes 942Fetches = 3
Fast Report 319
The new version of thefamous Fast Report isnow out FastReportreg 3is an add-in componentthat gives your applica-tions the ability to gener-ate reports quickly andefficiently FastReportregprovides all the tools youneed to develop reportsAll variants of FastRe-portreg 3 contains
Visual report designerwith rulers guides andzooming wizard forbasic types of reportexport filters for html tiffbmp jpg xls pdf out-puts Dot matrix reportssupport support for mostpopular DB-engines FullWYSIWYG text rotation0360 degrees memoobject supports simplehtml-tags (font color b iu sub sup) improvedstretching (StretchModeShiftMode properties)access to DB fieldsstyles text flow URLsAnchors
TrialBuy Now
News amp Events
TECT Softwarepresents
New versions of nice utilitiesfrom TECT Softwarewwwtectsoftnet
SPGen Stored Procedure Gen-erator for Firebird and Inter-
Base creates a standard set ofstored procs for any table full
details can be found here
httpwwwtectsoftnetProductsFirebird
FIBSPGenaspx
And FBMail Firebird EMail(FBMail) is a cross platform PHP
utility which easily allows thesending of emails direct from
within a database
This utility is specifically aimedat ISPs that support Firebirdbut can easily be used on anycomputer which is connected
to the internet
Full detailscan be found here
httpwwwtectsoftnetProductsFirebird
FBMailaspx
News amp Events
This is an excerpt from the one-pass script with NO SAVEPOINT enabled
CopyTiger 100
CopyTigera new productfrom Microtecis an advanced
databasereplication amp
synchronisationtool for
InterBaseFirebird
powered bytheir CopyCatcomponent set(see the articleabout CopyCat
in this issue)
For moreinformation about
CopyTiger seewwwmicrotecfr
copycatct
Downloadan evaluation
version
TestBed
17
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
SQLgt SET TRANSACTION NO SAVEPOINTSQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10352244Delta memory = 55296Max memory = 10352244Elapsed time= 472 secBuffers = 2048Reads = 2102Writes 350Fetches = 967154
SQLgt commitCurrent memory = 10344052Delta memory = -8192Max memory = 10352244Elapsed time= 015 secBuffers = 2048Reads = 1Writes 938Fetches = 3
Performance appears to be almost the same whether the NO SAVEPOINT option is enabled or not
Sequential bulk UPDATE statements
With the mutlti-pass script (sequential UPDATEs) the raw test results are rather large
Table 1
Test results for 5 sequentalUPDATEs
News amp Events
SQLHammer 15Enterprise Edition
The interestingdeveloper toolSQLHammer
now hasan Enterprise Edition
SQLHammer isa high-performance tool
for rapiddatabase development
and administration
It includes a commonintegrated development
environmentregistered custom
componentsbased on the
Borland Package Librarymechanism
and an Open Tools APIwritten in
DelphiObject Pascal
wwwmetadataforgecom
News amp Events
TestBed
18
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
For convenience the results are tabulated above
The first UPDATE statement has almost the same execution time with and without the NOSAVEPOINT option However memory consumption is reduced fivefold when we use NOSAVEPOINT
Figure 1
Time to perform UPDATEs with and without NO SAVEPOINT
Table 1
Test results for 5 sequental UPDATEs
In the second UPDATE we start to see the difference With default transaction settings thisUPDATE takes a very long time - 47 seconds - compared to only 7210 ms with NO SAVE-POINT enabled With default transaction settings we can see that memory usage is signifi-cant wherease with NO SAVEPOINT no additional memory is used
The third and all following UPDATE statements with default settings show equal time andmemory usage values and the growth of writes parameters
Figure 2
Memory usage while performing UPDATEs with and without NO SAVEPOINT
With NO SAVEPOINT usage we observe that timememory values and writes growth areall small and virtually equal for each pass The corresponding graphs are below
Inside the UNDO logSo what happened during the execution of the test scripts What is the secret behind thismagic that NO SAVEPOINT does Is there any cost attached to these gains
A few words about versions
You probably know already that InterBase is a multi-record-version database enginemeaning that each time a record is changed a new version of that record is produced The
TestBed
19
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
old version does not disappearimmediately but is retained as abackversion
In fact the first time a backversion iswritten to disk it is as a delta ver-sion which saves disk space andmemory usage by writing out onlythe differences between the old andthe new versions The engine canrebuild the full old version from thenew version and chains of delta ver-sions It is only if the same record isupdated more than once within thesame transaction that the full back-version is written to disk
The UNDO log concept
You may recall from Dmitrirsquos articlehow each transaction is implicitlyenclosed in a frame of savepointseach having its own undo log Thislog stores a record of each changein sequence ready for the possibili-ty that a rollback will be requested
A backversion materializes when-ever an UPDATE or DELETE state-ment is performed The engine hasto maintain all these backversions inthe undo log for the relevant save-point
So the Undo Log is a mechanism tomanage backversions for save-points in order to enable the associ-ated changes to be rolled back Theprocess of Undo logging is quitecomplex and maintaining it canconsume a lot of resources
The NO SAVEPOINToption
The NO SAVEPOINT option inInterBase 751 is a workaround forthe problem of performance lossduring bulk updates that do multiplepasses of a table The theory is ifusing the implicit savepoint man-agement causes problems then letskill the savepoints No savepoints ndashno problem -)
Besides ISQL it has been surfacedas a transaction parameter in bothDSQL and ESQL At the API level anew transaction parameter block(TPB) option isc_tpb_no_savepointcan be passed in the isc_start_trans-action() function call to disablesavepoints management Syntaxdetails for the latter flavors and forthe new tpb option can be found inthe 751 release notes
The effect of specifying the NOSAVEPOINT transaction parameteris that no undo log will be createdHowever along with the perform-ance gain for sequential bulkupdates it brings some costs fortransaction management
First and most obvious is that withNO SAVEPOINT enabled anyerror handling that relies on save-points is unavailable Any errorduring a NO SAVEPOINT transac-tion precludes all subsequent exe-cution and leads to rollback (see
ldquoRelease Notesrdquo for InterBase 75 SP1 ldquoNew in InterBase 751rdquo page 2-2)
Secondly when a NO SAVEPOINT transaction is rolled back it is marked as rolled back in the transactioninventory page Record version garbage thereby gets stuck in the interesting category and prevents theOIT from advancing Sweep is needed to advance the OIT and back out dead record versions
Fuller details of the NO SAVEPOINT option are provided in the InterBase 751 Release Notes
Initial situation
Consider the implementation details of the undo-log Figure 3 shows the initial situation
Recall that we perform this test on freshly-restored database so it is guaranteed that only one version existsfor any record
Figure 3
Initial situation before any UPDATE - only the one record version exists Undo log is empty
The first UPDATE
The first UPDATE statement creates delta backversions on disk (see figure 4) Since deltas store only thedifferences between the old and new versions they are quite small This operation is fast and it is easywork for the memory manager
It is simple to visualize the undo log when we perform the first UPDATEDELETE statement inside the trans-action ndash the engine just records the numbers of all affected records into the bitmap structure If it needs toroll back the changes associated with this savepoint it can read the stored numbers of the affected recordsthen walk down to the version of each and restore it from the updated version and the backversion storedon disk
TestBed
20
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
Figure 4
UPDATE1 create small delta version on disk and put record number into UNDO log
This approach is very fast and economical on memory usage The enginedoes not waste too many resources to handle this undo log ndash in fact it reusesthe existing multi-versioning mechanism Resource consumption is merelythe memory used to store the bitmap structure with the backversion num-bers We dont see any significant difference here between a transactionwith the default settings and one with the NO SAVEPOINT option enabled
The second UPDATE
When the second UPDATE statement is performed on the same set of records we have adifferent situation
Here is a good place to note that the example we are considering is the most simple situa-tion where only the one global (transaction level) savepoint exists We will also look at thedifference in the Undo log when an explicit (or enclosed BEGINhellip END) savepoint is used
To preserve on-disk integrity (remember the lsquocareful writersquo principle ) the engine mustcompute a new delta between the old version (by transaction1) and new version (by trans-action2 update2) store it somewhere on disk fetch the current full version (by transac-tion2 update1) put it into the in-memory undo-log replace it with the new full version (withbackpointers set to the newly created delta) and erase the old now superseded delta Asyou can see ndash there is much more work to do both on disk and in memory
The engine could write all intermediate versions to disk but there is no reason to do soThese versions are visible only to the modifying transaction and would not be used unlessa rollback was required
Figure 5
The second UPDATE creates a new delta backversion for transaction 1erases from disk the delta version created by the first UPDATE and copiesthe version from UPDATE1 into the Undo log
This all makes hard work for the memory manager and the CPU as you can see from thegrowth of the ldquomax memrdquordquodelta memrdquo parameters values in the test that uses the defaulttransaction parameters
When NO SAVEPOINT is enabled we avoid the expense of maintaining the Undo log Asa result we see execution time readswrites and memory usage as low for subsequentupdates as for the first
The third UPDATE
The third and all subsequent UPDATEs are similar to the second UPDATE with one excep-tion ndash memory usage does not grow any further
TestBed
21
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
Figure 6
The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2version and its own version is written to disk as the latest one
Figure 7
If we have an explicit SAVEPOINT each new record version associated with it will have acorresponding backversion in the Undo log of that savepoint
Original design of IB implement second UPDATE another way but sometime after IB6 Bor-land changed original behavior and we see what we see nowBut this theme is for anotherarticle)
Why is the delta of memory usage zero The reason is that beyond the second UPDATEno record version is created From here on the update just replaces record data on diskwith the newest one and shifts the superseded version into the Undo log
A more interesting question is why we see an increase in disk reads and writes during thetest We would have expected that the third and following UPDATEs would do essentiallyequal numbers of read and writes to write the newest versions and move the previous onesto the undo log However we are actually seeing a growing count of writes We have noanswer for it but we would be pleased to know
The following figure (figure 6) helps to illustrate the situation in the Undo log during thesequential updates When NO SAVEPOINT is enabled the only pieces we need to performare replacing the version on disk and updating the original backversion It is fast as the firstUPDATE
Explicit SAVEPOINTWhen an UPDATE statement is going to be performed within its own explicit or implicit
BEGIN END savepoint framework the engine has to store a backversion for each associ-ated record version in the Undo log
For example if we used an explicit savepoint eg SAVEPOINT Savepoint1 upon perform-ing UPDATE2 we would have the situation illustrated in figure 7
In this case the memory consumption would be expected to increase each time an UPDATEoccurs within the explicit savepoints scope
SummaryThe new transaction option NO SAVEPOINT can solve the problem of excessive resourceusage growth that can occur with sequential bulk updates It should be beneficial whenapplied appropriately Because the option can create its own problems by inhibiting theadvance of the OIT it should be used with caution of course The developer will need to takeextra care about database housekeeping particularly with respect to timely sweeping
Development area
22
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article
The sources in the bibliography arelisted in the order of their appear-ance in my mind
The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability
of all the methods taken from thearticles and of course I have testedall my own methods
Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise
I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc
And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle
The problem state-ment
What is the problem
Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common
1Everyone got used to them andfelt comfortable with them
2They are quite fast (if you usethem according to certain knownstandards)
3They use SQL which is an easycomprehensible and time-proveddata manipulation method
4They are based upon a strongmathematical theory
5They are convenient for applica-
tion development - if you developyour applications just like 20-30years ago
As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and
Author Vladimir Kotlyarevskyvladcontekru
Development area
23
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip
As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored
What do we need
The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing
RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task
The OID
All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it
First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have
any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database
OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex
Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)
Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world
And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones
Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles
The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers
ClassId
All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is
Development area
24
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain
Name Description cre-ation_date change_dateowner
Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect
links is often a very complicatedtask to put it mildly
Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later
The OBJECTS Table
Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure
So let us design this table
Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)
The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)
Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method
Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage
There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student
create domain TOID as integer not nullcreate table OBJECTS(
OID TOID primary keyClassId TOIDName varchar(32)
A simple plain dictionary can be retrieved from the OBJECTStable by the following query
select OID Name Description from OBJECTS where ClassId = LookupId
Development area
25
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description
from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)
Later I will demonstrate how to add alittle more intelligence to such simpledictionaries
Storing of more complexobjects
It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping
Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders
create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))
which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to
ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query
select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id
As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did
If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M
create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))
One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present
Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database
Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this
create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))
connected 1M with OBJECTS by OID There is also a table
create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))
which describes type metadata ndash an attribute set (their names and types) for eachobject type
All attributes for particular object where the OID is known are retrieved by the query
select attribute_id value from attributeswhere OID = oid
or with names of attributes
select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id
Development area
26
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries
select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id
Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query
Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the
database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base
The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1
Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage
Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB
It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-
utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation
To be continuedhellip
FastReport 3 - new generation of the reporting tools
Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix
reports support support most popular DB-engines
Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags
(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)
Access to DB fields styles text flow URLs Anchors
httpwwwfast-reportcom
Development area
27
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
Replicating and synchronizingInterBaseFirebirddatabases using CopyCat
AuthorJonathan Neve
jonathanmicrotecfr
wwwmicrotecfr
PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication
Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases
Data logging
Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)
Multi-node replication
Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after
In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it
replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)
When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it
Two-way replication
One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed
The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change
Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself
Primary key synchronization
One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-
Development area
28
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values
Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server
When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique
Conflict management
Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-
sion of the record this is a conflict
CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved
This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently
Difficult database structures
There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node
Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged
To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an
Development area
29
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable
PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms
1 As a set of Delphi C++ Builder components
2 As a standalone Win32 replication tool
We will now present each one of these products
1 CopyCat Delphi C++ Builder Component Set
CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions
Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server
Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table
Below is a concise guide for getting started with the CopyCat component suite
Download and install the evaluation components fromhttpwwwmicrotecfrcopycat
Prepare databases
1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components
2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)
3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo
4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base
5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo
Development area
30
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data
6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority
7) Apply all the generated SQL to all databases that should be replicated
8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)
Replicate
1 In Delphi open the ldquoReplicatorrdquo example project
2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property
3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases
4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)
5 Compile and run the example
6 Press the ldquoReplicate nowrdquo button
2 CopyTiger the CopyCat Win32standalone replication tool
For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER
Features include
bull Easy to use installer
bull Independent server Administrator tool (CTAdmin)
bull Configuration wizard for setting up links to master slave databases
bull Robust replication engine based on Microtec CopyCat
bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections
bull Simple amp intuitive control panel
bullAutomatic email notification on certain events (conflicts PK violations etc)
Visit the CopyTiger homepage to download a time-limited trial version
httpwwwmicrotecfrcopycatct
SUMMARY
In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit
1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications
2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure
CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool
You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr
As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold
Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code
ibd-852-150
Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems
Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases
Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the
Development area
31
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance
Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets
Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained
database statistics from time to time
Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and
indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports
The summary shown in Figure 1provides general information about
Figure 1 Summary of database statistics
Author Dmitri Kouzmenkokdvib-aidcom
1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction
Development area
32
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases
Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)
As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed
What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay
Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be
stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF
Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure
Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1
Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint
We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base
bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment
bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions
bull The Oldest active3 ndash the oldestcurrently active transaction
bull The Next transaction ndash thetransaction number that will beassigned to new transaction
bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell
if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources
bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset
As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem
It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless
Do not gather statistics if you
bull Just restored your database
bull Performed backup (gbak ndashb dbgdb) without the ndashg switch
bull Recently performed a manual sweep (gfix ndashsweep)
Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)
Development area
33
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw
The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data
For example look at ADDR_ADDRESS_IDX6 First of all the index
How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)
The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most
multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments
Table informationLets take look at another sampleoutput from IBAnalyst
The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics
ly and what the table size is inmegabytes Most of these warningsare customizable
In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records
Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics
were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could
bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway
Development area
34
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20
bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column
That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics
Reports
There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded
As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size
Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-
Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144
ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3
SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance
Readers feedback
35
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Comments to ldquoTemporary tablesrdquo
One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence
Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird
This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience
Then there are situations where theoptimizer simply loses the plot
because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans
Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem
The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp
and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample
The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)
We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is
dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably
Well to be fair Firebird developersare working on both topics
Volker Rehn
volkerrehnbigpondcom
We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it
Alexey Kovyazin Chief Editor
Readers feedback
Miscellaneous
36
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Invitations SubscriptionDonations Sponsorships
This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000
Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance
Subscribe now
To receive future issuesnotifications send email to
subscribeibdevelopercom
MagazineCD
Donations
TestBed
15
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
connect Ctestsp2ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitSET TRANSACTION NO SAVEPOINT enable NO SAVEPOINTupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit
Except for the inclusion of the SET TRANSACTION NO SAVEPOINT statement in the second script both scripts arethe same simply testing the behavior of engine in case of the single bulk UPDATE
To test sequential UPDATEs we added several UPDATE statements--we recommend using five The script for testing without NO SAVEPOINT would be
connect Etestsp3ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit
Fyracle 089
Janus has released anew version of Oracle-mode Firebird FyracleFyracle is a specialized
build of Firebird 15it adds temporary
tables hierarchicalqueries and
a PLSQL engine
Version 089 addssupport for storedprocedures written
in Java
Fyracle dramaticallyreduces the cost of port-ing Oracle-based appli-
cations to Firebird
Common usageincludes the
Compiere open sourceERP package
mid-market deploymentsof Developer
2000 applicationsand demo CDsof applications
without license trouble
Read more at
wwwjanus-soft-warecom
News amp Events
IBAnalyst 19
IBSurgeon has issued newversion of IBAnalyst
Now it can better analyzeInterBase or Firebird data-base statistics with usingmetadata information (in thiscase connection to databaseis required)
IBAnalyst is a tool that assistsa user to analyze in detailFirebird or InterBase data-base statistics and identifypossible problems with data-base performance mainte-nance and how an applica-tion interacts with the data-base
It graphically displays data-base statistics and can thenautomatically make intelli-gent suggestions aboutimproving database per-formance and databasemaintenance
wwwibsurgeoncomnewshtml
News amp Events
You can download all the scripts and the raw results of their execution from this locationhttpwwwibdevelopercomissue2testresultszip
TestBed
16
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
How to perform the test
The easiest way to perform the test is to use isqls INPUT command
Suppose you have the scripts located in ctestscripts
gtisqlUse CONNECT or CREATE DATABASE to specify a databaseSQLgtinput ctestscriptsscript01sql
Test resultsThe single bulk UPDATE
First letrsquos perform the test where the single-pass bulk UPDATE is performed
This is an excerpt from the one-pass script with default transaction settings
SQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10509940Delta memory = 214016Max memory = 10509940Elapsed time= 463 secBuffers = 2048Reads = 2111Writes 375Fetches = 974980
SQLgt commitCurrent memory = 10340752Delta memory = -169188Max memory = 10509940Elapsed time= 003 secBuffers = 2048Reads = 1Writes 942Fetches = 3
Fast Report 319
The new version of thefamous Fast Report isnow out FastReportreg 3is an add-in componentthat gives your applica-tions the ability to gener-ate reports quickly andefficiently FastReportregprovides all the tools youneed to develop reportsAll variants of FastRe-portreg 3 contains
Visual report designerwith rulers guides andzooming wizard forbasic types of reportexport filters for html tiffbmp jpg xls pdf out-puts Dot matrix reportssupport support for mostpopular DB-engines FullWYSIWYG text rotation0360 degrees memoobject supports simplehtml-tags (font color b iu sub sup) improvedstretching (StretchModeShiftMode properties)access to DB fieldsstyles text flow URLsAnchors
TrialBuy Now
News amp Events
TECT Softwarepresents
New versions of nice utilitiesfrom TECT Softwarewwwtectsoftnet
SPGen Stored Procedure Gen-erator for Firebird and Inter-
Base creates a standard set ofstored procs for any table full
details can be found here
httpwwwtectsoftnetProductsFirebird
FIBSPGenaspx
And FBMail Firebird EMail(FBMail) is a cross platform PHP
utility which easily allows thesending of emails direct from
within a database
This utility is specifically aimedat ISPs that support Firebirdbut can easily be used on anycomputer which is connected
to the internet
Full detailscan be found here
httpwwwtectsoftnetProductsFirebird
FBMailaspx
News amp Events
This is an excerpt from the one-pass script with NO SAVEPOINT enabled
CopyTiger 100
CopyTigera new productfrom Microtecis an advanced
databasereplication amp
synchronisationtool for
InterBaseFirebird
powered bytheir CopyCatcomponent set(see the articleabout CopyCat
in this issue)
For moreinformation about
CopyTiger seewwwmicrotecfr
copycatct
Downloadan evaluation
version
TestBed
17
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
SQLgt SET TRANSACTION NO SAVEPOINTSQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10352244Delta memory = 55296Max memory = 10352244Elapsed time= 472 secBuffers = 2048Reads = 2102Writes 350Fetches = 967154
SQLgt commitCurrent memory = 10344052Delta memory = -8192Max memory = 10352244Elapsed time= 015 secBuffers = 2048Reads = 1Writes 938Fetches = 3
Performance appears to be almost the same whether the NO SAVEPOINT option is enabled or not
Sequential bulk UPDATE statements
With the mutlti-pass script (sequential UPDATEs) the raw test results are rather large
Table 1
Test results for 5 sequentalUPDATEs
News amp Events
SQLHammer 15Enterprise Edition
The interestingdeveloper toolSQLHammer
now hasan Enterprise Edition
SQLHammer isa high-performance tool
for rapiddatabase development
and administration
It includes a commonintegrated development
environmentregistered custom
componentsbased on the
Borland Package Librarymechanism
and an Open Tools APIwritten in
DelphiObject Pascal
wwwmetadataforgecom
News amp Events
TestBed
18
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
For convenience the results are tabulated above
The first UPDATE statement has almost the same execution time with and without the NOSAVEPOINT option However memory consumption is reduced fivefold when we use NOSAVEPOINT
Figure 1
Time to perform UPDATEs with and without NO SAVEPOINT
Table 1
Test results for 5 sequental UPDATEs
In the second UPDATE we start to see the difference With default transaction settings thisUPDATE takes a very long time - 47 seconds - compared to only 7210 ms with NO SAVE-POINT enabled With default transaction settings we can see that memory usage is signifi-cant wherease with NO SAVEPOINT no additional memory is used
The third and all following UPDATE statements with default settings show equal time andmemory usage values and the growth of writes parameters
Figure 2
Memory usage while performing UPDATEs with and without NO SAVEPOINT
With NO SAVEPOINT usage we observe that timememory values and writes growth areall small and virtually equal for each pass The corresponding graphs are below
Inside the UNDO logSo what happened during the execution of the test scripts What is the secret behind thismagic that NO SAVEPOINT does Is there any cost attached to these gains
A few words about versions
You probably know already that InterBase is a multi-record-version database enginemeaning that each time a record is changed a new version of that record is produced The
TestBed
19
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
old version does not disappearimmediately but is retained as abackversion
In fact the first time a backversion iswritten to disk it is as a delta ver-sion which saves disk space andmemory usage by writing out onlythe differences between the old andthe new versions The engine canrebuild the full old version from thenew version and chains of delta ver-sions It is only if the same record isupdated more than once within thesame transaction that the full back-version is written to disk
The UNDO log concept
You may recall from Dmitrirsquos articlehow each transaction is implicitlyenclosed in a frame of savepointseach having its own undo log Thislog stores a record of each changein sequence ready for the possibili-ty that a rollback will be requested
A backversion materializes when-ever an UPDATE or DELETE state-ment is performed The engine hasto maintain all these backversions inthe undo log for the relevant save-point
So the Undo Log is a mechanism tomanage backversions for save-points in order to enable the associ-ated changes to be rolled back Theprocess of Undo logging is quitecomplex and maintaining it canconsume a lot of resources
The NO SAVEPOINToption
The NO SAVEPOINT option inInterBase 751 is a workaround forthe problem of performance lossduring bulk updates that do multiplepasses of a table The theory is ifusing the implicit savepoint man-agement causes problems then letskill the savepoints No savepoints ndashno problem -)
Besides ISQL it has been surfacedas a transaction parameter in bothDSQL and ESQL At the API level anew transaction parameter block(TPB) option isc_tpb_no_savepointcan be passed in the isc_start_trans-action() function call to disablesavepoints management Syntaxdetails for the latter flavors and forthe new tpb option can be found inthe 751 release notes
The effect of specifying the NOSAVEPOINT transaction parameteris that no undo log will be createdHowever along with the perform-ance gain for sequential bulkupdates it brings some costs fortransaction management
First and most obvious is that withNO SAVEPOINT enabled anyerror handling that relies on save-points is unavailable Any errorduring a NO SAVEPOINT transac-tion precludes all subsequent exe-cution and leads to rollback (see
ldquoRelease Notesrdquo for InterBase 75 SP1 ldquoNew in InterBase 751rdquo page 2-2)
Secondly when a NO SAVEPOINT transaction is rolled back it is marked as rolled back in the transactioninventory page Record version garbage thereby gets stuck in the interesting category and prevents theOIT from advancing Sweep is needed to advance the OIT and back out dead record versions
Fuller details of the NO SAVEPOINT option are provided in the InterBase 751 Release Notes
Initial situation
Consider the implementation details of the undo-log Figure 3 shows the initial situation
Recall that we perform this test on freshly-restored database so it is guaranteed that only one version existsfor any record
Figure 3
Initial situation before any UPDATE - only the one record version exists Undo log is empty
The first UPDATE
The first UPDATE statement creates delta backversions on disk (see figure 4) Since deltas store only thedifferences between the old and new versions they are quite small This operation is fast and it is easywork for the memory manager
It is simple to visualize the undo log when we perform the first UPDATEDELETE statement inside the trans-action ndash the engine just records the numbers of all affected records into the bitmap structure If it needs toroll back the changes associated with this savepoint it can read the stored numbers of the affected recordsthen walk down to the version of each and restore it from the updated version and the backversion storedon disk
TestBed
20
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
Figure 4
UPDATE1 create small delta version on disk and put record number into UNDO log
This approach is very fast and economical on memory usage The enginedoes not waste too many resources to handle this undo log ndash in fact it reusesthe existing multi-versioning mechanism Resource consumption is merelythe memory used to store the bitmap structure with the backversion num-bers We dont see any significant difference here between a transactionwith the default settings and one with the NO SAVEPOINT option enabled
The second UPDATE
When the second UPDATE statement is performed on the same set of records we have adifferent situation
Here is a good place to note that the example we are considering is the most simple situa-tion where only the one global (transaction level) savepoint exists We will also look at thedifference in the Undo log when an explicit (or enclosed BEGINhellip END) savepoint is used
To preserve on-disk integrity (remember the lsquocareful writersquo principle ) the engine mustcompute a new delta between the old version (by transaction1) and new version (by trans-action2 update2) store it somewhere on disk fetch the current full version (by transac-tion2 update1) put it into the in-memory undo-log replace it with the new full version (withbackpointers set to the newly created delta) and erase the old now superseded delta Asyou can see ndash there is much more work to do both on disk and in memory
The engine could write all intermediate versions to disk but there is no reason to do soThese versions are visible only to the modifying transaction and would not be used unlessa rollback was required
Figure 5
The second UPDATE creates a new delta backversion for transaction 1erases from disk the delta version created by the first UPDATE and copiesthe version from UPDATE1 into the Undo log
This all makes hard work for the memory manager and the CPU as you can see from thegrowth of the ldquomax memrdquordquodelta memrdquo parameters values in the test that uses the defaulttransaction parameters
When NO SAVEPOINT is enabled we avoid the expense of maintaining the Undo log Asa result we see execution time readswrites and memory usage as low for subsequentupdates as for the first
The third UPDATE
The third and all subsequent UPDATEs are similar to the second UPDATE with one excep-tion ndash memory usage does not grow any further
TestBed
21
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
Figure 6
The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2version and its own version is written to disk as the latest one
Figure 7
If we have an explicit SAVEPOINT each new record version associated with it will have acorresponding backversion in the Undo log of that savepoint
Original design of IB implement second UPDATE another way but sometime after IB6 Bor-land changed original behavior and we see what we see nowBut this theme is for anotherarticle)
Why is the delta of memory usage zero The reason is that beyond the second UPDATEno record version is created From here on the update just replaces record data on diskwith the newest one and shifts the superseded version into the Undo log
A more interesting question is why we see an increase in disk reads and writes during thetest We would have expected that the third and following UPDATEs would do essentiallyequal numbers of read and writes to write the newest versions and move the previous onesto the undo log However we are actually seeing a growing count of writes We have noanswer for it but we would be pleased to know
The following figure (figure 6) helps to illustrate the situation in the Undo log during thesequential updates When NO SAVEPOINT is enabled the only pieces we need to performare replacing the version on disk and updating the original backversion It is fast as the firstUPDATE
Explicit SAVEPOINTWhen an UPDATE statement is going to be performed within its own explicit or implicit
BEGIN END savepoint framework the engine has to store a backversion for each associ-ated record version in the Undo log
For example if we used an explicit savepoint eg SAVEPOINT Savepoint1 upon perform-ing UPDATE2 we would have the situation illustrated in figure 7
In this case the memory consumption would be expected to increase each time an UPDATEoccurs within the explicit savepoints scope
SummaryThe new transaction option NO SAVEPOINT can solve the problem of excessive resourceusage growth that can occur with sequential bulk updates It should be beneficial whenapplied appropriately Because the option can create its own problems by inhibiting theadvance of the OIT it should be used with caution of course The developer will need to takeextra care about database housekeeping particularly with respect to timely sweeping
Development area
22
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article
The sources in the bibliography arelisted in the order of their appear-ance in my mind
The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability
of all the methods taken from thearticles and of course I have testedall my own methods
Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise
I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc
And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle
The problem state-ment
What is the problem
Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common
1Everyone got used to them andfelt comfortable with them
2They are quite fast (if you usethem according to certain knownstandards)
3They use SQL which is an easycomprehensible and time-proveddata manipulation method
4They are based upon a strongmathematical theory
5They are convenient for applica-
tion development - if you developyour applications just like 20-30years ago
As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and
Author Vladimir Kotlyarevskyvladcontekru
Development area
23
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip
As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored
What do we need
The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing
RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task
The OID
All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it
First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have
any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database
OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex
Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)
Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world
And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones
Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles
The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers
ClassId
All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is
Development area
24
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain
Name Description cre-ation_date change_dateowner
Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect
links is often a very complicatedtask to put it mildly
Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later
The OBJECTS Table
Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure
So let us design this table
Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)
The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)
Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method
Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage
There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student
create domain TOID as integer not nullcreate table OBJECTS(
OID TOID primary keyClassId TOIDName varchar(32)
A simple plain dictionary can be retrieved from the OBJECTStable by the following query
select OID Name Description from OBJECTS where ClassId = LookupId
Development area
25
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description
from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)
Later I will demonstrate how to add alittle more intelligence to such simpledictionaries
Storing of more complexobjects
It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping
Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders
create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))
which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to
ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query
select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id
As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did
If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M
create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))
One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present
Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database
Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this
create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))
connected 1M with OBJECTS by OID There is also a table
create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))
which describes type metadata ndash an attribute set (their names and types) for eachobject type
All attributes for particular object where the OID is known are retrieved by the query
select attribute_id value from attributeswhere OID = oid
or with names of attributes
select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id
Development area
26
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries
select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id
Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query
Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the
database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base
The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1
Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage
Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB
It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-
utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation
To be continuedhellip
FastReport 3 - new generation of the reporting tools
Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix
reports support support most popular DB-engines
Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags
(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)
Access to DB fields styles text flow URLs Anchors
httpwwwfast-reportcom
Development area
27
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
Replicating and synchronizingInterBaseFirebirddatabases using CopyCat
AuthorJonathan Neve
jonathanmicrotecfr
wwwmicrotecfr
PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication
Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases
Data logging
Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)
Multi-node replication
Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after
In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it
replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)
When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it
Two-way replication
One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed
The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change
Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself
Primary key synchronization
One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-
Development area
28
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values
Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server
When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique
Conflict management
Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-
sion of the record this is a conflict
CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved
This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently
Difficult database structures
There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node
Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged
To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an
Development area
29
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable
PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms
1 As a set of Delphi C++ Builder components
2 As a standalone Win32 replication tool
We will now present each one of these products
1 CopyCat Delphi C++ Builder Component Set
CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions
Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server
Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table
Below is a concise guide for getting started with the CopyCat component suite
Download and install the evaluation components fromhttpwwwmicrotecfrcopycat
Prepare databases
1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components
2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)
3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo
4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base
5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo
Development area
30
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data
6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority
7) Apply all the generated SQL to all databases that should be replicated
8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)
Replicate
1 In Delphi open the ldquoReplicatorrdquo example project
2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property
3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases
4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)
5 Compile and run the example
6 Press the ldquoReplicate nowrdquo button
2 CopyTiger the CopyCat Win32standalone replication tool
For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER
Features include
bull Easy to use installer
bull Independent server Administrator tool (CTAdmin)
bull Configuration wizard for setting up links to master slave databases
bull Robust replication engine based on Microtec CopyCat
bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections
bull Simple amp intuitive control panel
bullAutomatic email notification on certain events (conflicts PK violations etc)
Visit the CopyTiger homepage to download a time-limited trial version
httpwwwmicrotecfrcopycatct
SUMMARY
In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit
1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications
2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure
CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool
You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr
As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold
Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code
ibd-852-150
Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems
Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases
Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the
Development area
31
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance
Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets
Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained
database statistics from time to time
Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and
indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports
The summary shown in Figure 1provides general information about
Figure 1 Summary of database statistics
Author Dmitri Kouzmenkokdvib-aidcom
1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction
Development area
32
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases
Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)
As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed
What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay
Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be
stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF
Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure
Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1
Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint
We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base
bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment
bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions
bull The Oldest active3 ndash the oldestcurrently active transaction
bull The Next transaction ndash thetransaction number that will beassigned to new transaction
bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell
if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources
bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset
As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem
It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless
Do not gather statistics if you
bull Just restored your database
bull Performed backup (gbak ndashb dbgdb) without the ndashg switch
bull Recently performed a manual sweep (gfix ndashsweep)
Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)
Development area
33
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw
The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data
For example look at ADDR_ADDRESS_IDX6 First of all the index
How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)
The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most
multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments
Table informationLets take look at another sampleoutput from IBAnalyst
The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics
ly and what the table size is inmegabytes Most of these warningsare customizable
In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records
Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics
were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could
bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway
Development area
34
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20
bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column
That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics
Reports
There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded
As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size
Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-
Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144
ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3
SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance
Readers feedback
35
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Comments to ldquoTemporary tablesrdquo
One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence
Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird
This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience
Then there are situations where theoptimizer simply loses the plot
because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans
Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem
The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp
and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample
The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)
We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is
dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably
Well to be fair Firebird developersare working on both topics
Volker Rehn
volkerrehnbigpondcom
We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it
Alexey Kovyazin Chief Editor
Readers feedback
Miscellaneous
36
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Invitations SubscriptionDonations Sponsorships
This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000
Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance
Subscribe now
To receive future issuesnotifications send email to
subscribeibdevelopercom
MagazineCD
Donations
TestBed
16
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
How to perform the test
The easiest way to perform the test is to use isqls INPUT command
Suppose you have the scripts located in ctestscripts
gtisqlUse CONNECT or CREATE DATABASE to specify a databaseSQLgtinput ctestscriptsscript01sql
Test resultsThe single bulk UPDATE
First letrsquos perform the test where the single-pass bulk UPDATE is performed
This is an excerpt from the one-pass script with default transaction settings
SQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10509940Delta memory = 214016Max memory = 10509940Elapsed time= 463 secBuffers = 2048Reads = 2111Writes 375Fetches = 974980
SQLgt commitCurrent memory = 10340752Delta memory = -169188Max memory = 10509940Elapsed time= 003 secBuffers = 2048Reads = 1Writes 942Fetches = 3
Fast Report 319
The new version of thefamous Fast Report isnow out FastReportreg 3is an add-in componentthat gives your applica-tions the ability to gener-ate reports quickly andefficiently FastReportregprovides all the tools youneed to develop reportsAll variants of FastRe-portreg 3 contains
Visual report designerwith rulers guides andzooming wizard forbasic types of reportexport filters for html tiffbmp jpg xls pdf out-puts Dot matrix reportssupport support for mostpopular DB-engines FullWYSIWYG text rotation0360 degrees memoobject supports simplehtml-tags (font color b iu sub sup) improvedstretching (StretchModeShiftMode properties)access to DB fieldsstyles text flow URLsAnchors
TrialBuy Now
News amp Events
TECT Softwarepresents
New versions of nice utilitiesfrom TECT Softwarewwwtectsoftnet
SPGen Stored Procedure Gen-erator for Firebird and Inter-
Base creates a standard set ofstored procs for any table full
details can be found here
httpwwwtectsoftnetProductsFirebird
FIBSPGenaspx
And FBMail Firebird EMail(FBMail) is a cross platform PHP
utility which easily allows thesending of emails direct from
within a database
This utility is specifically aimedat ISPs that support Firebirdbut can easily be used on anycomputer which is connected
to the internet
Full detailscan be found here
httpwwwtectsoftnetProductsFirebird
FBMailaspx
News amp Events
This is an excerpt from the one-pass script with NO SAVEPOINT enabled
CopyTiger 100
CopyTigera new productfrom Microtecis an advanced
databasereplication amp
synchronisationtool for
InterBaseFirebird
powered bytheir CopyCatcomponent set(see the articleabout CopyCat
in this issue)
For moreinformation about
CopyTiger seewwwmicrotecfr
copycatct
Downloadan evaluation
version
TestBed
17
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
SQLgt SET TRANSACTION NO SAVEPOINTSQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10352244Delta memory = 55296Max memory = 10352244Elapsed time= 472 secBuffers = 2048Reads = 2102Writes 350Fetches = 967154
SQLgt commitCurrent memory = 10344052Delta memory = -8192Max memory = 10352244Elapsed time= 015 secBuffers = 2048Reads = 1Writes 938Fetches = 3
Performance appears to be almost the same whether the NO SAVEPOINT option is enabled or not
Sequential bulk UPDATE statements
With the mutlti-pass script (sequential UPDATEs) the raw test results are rather large
Table 1
Test results for 5 sequentalUPDATEs
News amp Events
SQLHammer 15Enterprise Edition
The interestingdeveloper toolSQLHammer
now hasan Enterprise Edition
SQLHammer isa high-performance tool
for rapiddatabase development
and administration
It includes a commonintegrated development
environmentregistered custom
componentsbased on the
Borland Package Librarymechanism
and an Open Tools APIwritten in
DelphiObject Pascal
wwwmetadataforgecom
News amp Events
TestBed
18
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
For convenience the results are tabulated above
The first UPDATE statement has almost the same execution time with and without the NOSAVEPOINT option However memory consumption is reduced fivefold when we use NOSAVEPOINT
Figure 1
Time to perform UPDATEs with and without NO SAVEPOINT
Table 1
Test results for 5 sequental UPDATEs
In the second UPDATE we start to see the difference With default transaction settings thisUPDATE takes a very long time - 47 seconds - compared to only 7210 ms with NO SAVE-POINT enabled With default transaction settings we can see that memory usage is signifi-cant wherease with NO SAVEPOINT no additional memory is used
The third and all following UPDATE statements with default settings show equal time andmemory usage values and the growth of writes parameters
Figure 2
Memory usage while performing UPDATEs with and without NO SAVEPOINT
With NO SAVEPOINT usage we observe that timememory values and writes growth areall small and virtually equal for each pass The corresponding graphs are below
Inside the UNDO logSo what happened during the execution of the test scripts What is the secret behind thismagic that NO SAVEPOINT does Is there any cost attached to these gains
A few words about versions
You probably know already that InterBase is a multi-record-version database enginemeaning that each time a record is changed a new version of that record is produced The
TestBed
19
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
old version does not disappearimmediately but is retained as abackversion
In fact the first time a backversion iswritten to disk it is as a delta ver-sion which saves disk space andmemory usage by writing out onlythe differences between the old andthe new versions The engine canrebuild the full old version from thenew version and chains of delta ver-sions It is only if the same record isupdated more than once within thesame transaction that the full back-version is written to disk
The UNDO log concept
You may recall from Dmitrirsquos articlehow each transaction is implicitlyenclosed in a frame of savepointseach having its own undo log Thislog stores a record of each changein sequence ready for the possibili-ty that a rollback will be requested
A backversion materializes when-ever an UPDATE or DELETE state-ment is performed The engine hasto maintain all these backversions inthe undo log for the relevant save-point
So the Undo Log is a mechanism tomanage backversions for save-points in order to enable the associ-ated changes to be rolled back Theprocess of Undo logging is quitecomplex and maintaining it canconsume a lot of resources
The NO SAVEPOINToption
The NO SAVEPOINT option inInterBase 751 is a workaround forthe problem of performance lossduring bulk updates that do multiplepasses of a table The theory is ifusing the implicit savepoint man-agement causes problems then letskill the savepoints No savepoints ndashno problem -)
Besides ISQL it has been surfacedas a transaction parameter in bothDSQL and ESQL At the API level anew transaction parameter block(TPB) option isc_tpb_no_savepointcan be passed in the isc_start_trans-action() function call to disablesavepoints management Syntaxdetails for the latter flavors and forthe new tpb option can be found inthe 751 release notes
The effect of specifying the NOSAVEPOINT transaction parameteris that no undo log will be createdHowever along with the perform-ance gain for sequential bulkupdates it brings some costs fortransaction management
First and most obvious is that withNO SAVEPOINT enabled anyerror handling that relies on save-points is unavailable Any errorduring a NO SAVEPOINT transac-tion precludes all subsequent exe-cution and leads to rollback (see
ldquoRelease Notesrdquo for InterBase 75 SP1 ldquoNew in InterBase 751rdquo page 2-2)
Secondly when a NO SAVEPOINT transaction is rolled back it is marked as rolled back in the transactioninventory page Record version garbage thereby gets stuck in the interesting category and prevents theOIT from advancing Sweep is needed to advance the OIT and back out dead record versions
Fuller details of the NO SAVEPOINT option are provided in the InterBase 751 Release Notes
Initial situation
Consider the implementation details of the undo-log Figure 3 shows the initial situation
Recall that we perform this test on freshly-restored database so it is guaranteed that only one version existsfor any record
Figure 3
Initial situation before any UPDATE - only the one record version exists Undo log is empty
The first UPDATE
The first UPDATE statement creates delta backversions on disk (see figure 4) Since deltas store only thedifferences between the old and new versions they are quite small This operation is fast and it is easywork for the memory manager
It is simple to visualize the undo log when we perform the first UPDATEDELETE statement inside the trans-action ndash the engine just records the numbers of all affected records into the bitmap structure If it needs toroll back the changes associated with this savepoint it can read the stored numbers of the affected recordsthen walk down to the version of each and restore it from the updated version and the backversion storedon disk
TestBed
20
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
Figure 4
UPDATE1 create small delta version on disk and put record number into UNDO log
This approach is very fast and economical on memory usage The enginedoes not waste too many resources to handle this undo log ndash in fact it reusesthe existing multi-versioning mechanism Resource consumption is merelythe memory used to store the bitmap structure with the backversion num-bers We dont see any significant difference here between a transactionwith the default settings and one with the NO SAVEPOINT option enabled
The second UPDATE
When the second UPDATE statement is performed on the same set of records we have adifferent situation
Here is a good place to note that the example we are considering is the most simple situa-tion where only the one global (transaction level) savepoint exists We will also look at thedifference in the Undo log when an explicit (or enclosed BEGINhellip END) savepoint is used
To preserve on-disk integrity (remember the lsquocareful writersquo principle ) the engine mustcompute a new delta between the old version (by transaction1) and new version (by trans-action2 update2) store it somewhere on disk fetch the current full version (by transac-tion2 update1) put it into the in-memory undo-log replace it with the new full version (withbackpointers set to the newly created delta) and erase the old now superseded delta Asyou can see ndash there is much more work to do both on disk and in memory
The engine could write all intermediate versions to disk but there is no reason to do soThese versions are visible only to the modifying transaction and would not be used unlessa rollback was required
Figure 5
The second UPDATE creates a new delta backversion for transaction 1erases from disk the delta version created by the first UPDATE and copiesthe version from UPDATE1 into the Undo log
This all makes hard work for the memory manager and the CPU as you can see from thegrowth of the ldquomax memrdquordquodelta memrdquo parameters values in the test that uses the defaulttransaction parameters
When NO SAVEPOINT is enabled we avoid the expense of maintaining the Undo log Asa result we see execution time readswrites and memory usage as low for subsequentupdates as for the first
The third UPDATE
The third and all subsequent UPDATEs are similar to the second UPDATE with one excep-tion ndash memory usage does not grow any further
TestBed
21
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
Figure 6
The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2version and its own version is written to disk as the latest one
Figure 7
If we have an explicit SAVEPOINT each new record version associated with it will have acorresponding backversion in the Undo log of that savepoint
Original design of IB implement second UPDATE another way but sometime after IB6 Bor-land changed original behavior and we see what we see nowBut this theme is for anotherarticle)
Why is the delta of memory usage zero The reason is that beyond the second UPDATEno record version is created From here on the update just replaces record data on diskwith the newest one and shifts the superseded version into the Undo log
A more interesting question is why we see an increase in disk reads and writes during thetest We would have expected that the third and following UPDATEs would do essentiallyequal numbers of read and writes to write the newest versions and move the previous onesto the undo log However we are actually seeing a growing count of writes We have noanswer for it but we would be pleased to know
The following figure (figure 6) helps to illustrate the situation in the Undo log during thesequential updates When NO SAVEPOINT is enabled the only pieces we need to performare replacing the version on disk and updating the original backversion It is fast as the firstUPDATE
Explicit SAVEPOINTWhen an UPDATE statement is going to be performed within its own explicit or implicit
BEGIN END savepoint framework the engine has to store a backversion for each associ-ated record version in the Undo log
For example if we used an explicit savepoint eg SAVEPOINT Savepoint1 upon perform-ing UPDATE2 we would have the situation illustrated in figure 7
In this case the memory consumption would be expected to increase each time an UPDATEoccurs within the explicit savepoints scope
SummaryThe new transaction option NO SAVEPOINT can solve the problem of excessive resourceusage growth that can occur with sequential bulk updates It should be beneficial whenapplied appropriately Because the option can create its own problems by inhibiting theadvance of the OIT it should be used with caution of course The developer will need to takeextra care about database housekeeping particularly with respect to timely sweeping
Development area
22
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article
The sources in the bibliography arelisted in the order of their appear-ance in my mind
The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability
of all the methods taken from thearticles and of course I have testedall my own methods
Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise
I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc
And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle
The problem state-ment
What is the problem
Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common
1Everyone got used to them andfelt comfortable with them
2They are quite fast (if you usethem according to certain knownstandards)
3They use SQL which is an easycomprehensible and time-proveddata manipulation method
4They are based upon a strongmathematical theory
5They are convenient for applica-
tion development - if you developyour applications just like 20-30years ago
As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and
Author Vladimir Kotlyarevskyvladcontekru
Development area
23
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip
As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored
What do we need
The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing
RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task
The OID
All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it
First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have
any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database
OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex
Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)
Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world
And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones
Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles
The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers
ClassId
All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is
Development area
24
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain
Name Description cre-ation_date change_dateowner
Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect
links is often a very complicatedtask to put it mildly
Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later
The OBJECTS Table
Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure
So let us design this table
Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)
The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)
Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method
Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage
There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student
create domain TOID as integer not nullcreate table OBJECTS(
OID TOID primary keyClassId TOIDName varchar(32)
A simple plain dictionary can be retrieved from the OBJECTStable by the following query
select OID Name Description from OBJECTS where ClassId = LookupId
Development area
25
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description
from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)
Later I will demonstrate how to add alittle more intelligence to such simpledictionaries
Storing of more complexobjects
It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping
Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders
create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))
which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to
ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query
select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id
As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did
If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M
create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))
One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present
Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database
Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this
create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))
connected 1M with OBJECTS by OID There is also a table
create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))
which describes type metadata ndash an attribute set (their names and types) for eachobject type
All attributes for particular object where the OID is known are retrieved by the query
select attribute_id value from attributeswhere OID = oid
or with names of attributes
select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id
Development area
26
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries
select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id
Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query
Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the
database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base
The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1
Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage
Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB
It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-
utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation
To be continuedhellip
FastReport 3 - new generation of the reporting tools
Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix
reports support support most popular DB-engines
Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags
(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)
Access to DB fields styles text flow URLs Anchors
httpwwwfast-reportcom
Development area
27
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
Replicating and synchronizingInterBaseFirebirddatabases using CopyCat
AuthorJonathan Neve
jonathanmicrotecfr
wwwmicrotecfr
PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication
Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases
Data logging
Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)
Multi-node replication
Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after
In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it
replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)
When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it
Two-way replication
One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed
The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change
Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself
Primary key synchronization
One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-
Development area
28
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values
Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server
When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique
Conflict management
Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-
sion of the record this is a conflict
CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved
This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently
Difficult database structures
There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node
Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged
To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an
Development area
29
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable
PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms
1 As a set of Delphi C++ Builder components
2 As a standalone Win32 replication tool
We will now present each one of these products
1 CopyCat Delphi C++ Builder Component Set
CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions
Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server
Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table
Below is a concise guide for getting started with the CopyCat component suite
Download and install the evaluation components fromhttpwwwmicrotecfrcopycat
Prepare databases
1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components
2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)
3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo
4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base
5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo
Development area
30
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data
6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority
7) Apply all the generated SQL to all databases that should be replicated
8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)
Replicate
1 In Delphi open the ldquoReplicatorrdquo example project
2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property
3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases
4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)
5 Compile and run the example
6 Press the ldquoReplicate nowrdquo button
2 CopyTiger the CopyCat Win32standalone replication tool
For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER
Features include
bull Easy to use installer
bull Independent server Administrator tool (CTAdmin)
bull Configuration wizard for setting up links to master slave databases
bull Robust replication engine based on Microtec CopyCat
bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections
bull Simple amp intuitive control panel
bullAutomatic email notification on certain events (conflicts PK violations etc)
Visit the CopyTiger homepage to download a time-limited trial version
httpwwwmicrotecfrcopycatct
SUMMARY
In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit
1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications
2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure
CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool
You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr
As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold
Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code
ibd-852-150
Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems
Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases
Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the
Development area
31
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance
Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets
Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained
database statistics from time to time
Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and
indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports
The summary shown in Figure 1provides general information about
Figure 1 Summary of database statistics
Author Dmitri Kouzmenkokdvib-aidcom
1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction
Development area
32
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases
Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)
As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed
What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay
Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be
stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF
Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure
Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1
Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint
We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base
bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment
bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions
bull The Oldest active3 ndash the oldestcurrently active transaction
bull The Next transaction ndash thetransaction number that will beassigned to new transaction
bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell
if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources
bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset
As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem
It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless
Do not gather statistics if you
bull Just restored your database
bull Performed backup (gbak ndashb dbgdb) without the ndashg switch
bull Recently performed a manual sweep (gfix ndashsweep)
Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)
Development area
33
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw
The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data
For example look at ADDR_ADDRESS_IDX6 First of all the index
How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)
The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most
multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments
Table informationLets take look at another sampleoutput from IBAnalyst
The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics
ly and what the table size is inmegabytes Most of these warningsare customizable
In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records
Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics
were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could
bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway
Development area
34
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20
bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column
That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics
Reports
There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded
As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size
Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-
Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144
ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3
SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance
Readers feedback
35
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Comments to ldquoTemporary tablesrdquo
One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence
Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird
This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience
Then there are situations where theoptimizer simply loses the plot
because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans
Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem
The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp
and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample
The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)
We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is
dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably
Well to be fair Firebird developersare working on both topics
Volker Rehn
volkerrehnbigpondcom
We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it
Alexey Kovyazin Chief Editor
Readers feedback
Miscellaneous
36
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Invitations SubscriptionDonations Sponsorships
This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000
Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance
Subscribe now
To receive future issuesnotifications send email to
subscribeibdevelopercom
MagazineCD
Donations
CopyTiger 100
CopyTigera new productfrom Microtecis an advanced
databasereplication amp
synchronisationtool for
InterBaseFirebird
powered bytheir CopyCatcomponent set(see the articleabout CopyCat
in this issue)
For moreinformation about
CopyTiger seewwwmicrotecfr
copycatct
Downloadan evaluation
version
TestBed
17
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
SQLgt SET TRANSACTION NO SAVEPOINTSQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10352244Delta memory = 55296Max memory = 10352244Elapsed time= 472 secBuffers = 2048Reads = 2102Writes 350Fetches = 967154
SQLgt commitCurrent memory = 10344052Delta memory = -8192Max memory = 10352244Elapsed time= 015 secBuffers = 2048Reads = 1Writes 938Fetches = 3
Performance appears to be almost the same whether the NO SAVEPOINT option is enabled or not
Sequential bulk UPDATE statements
With the mutlti-pass script (sequential UPDATEs) the raw test results are rather large
Table 1
Test results for 5 sequentalUPDATEs
News amp Events
SQLHammer 15Enterprise Edition
The interestingdeveloper toolSQLHammer
now hasan Enterprise Edition
SQLHammer isa high-performance tool
for rapiddatabase development
and administration
It includes a commonintegrated development
environmentregistered custom
componentsbased on the
Borland Package Librarymechanism
and an Open Tools APIwritten in
DelphiObject Pascal
wwwmetadataforgecom
News amp Events
TestBed
18
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
For convenience the results are tabulated above
The first UPDATE statement has almost the same execution time with and without the NOSAVEPOINT option However memory consumption is reduced fivefold when we use NOSAVEPOINT
Figure 1
Time to perform UPDATEs with and without NO SAVEPOINT
Table 1
Test results for 5 sequental UPDATEs
In the second UPDATE we start to see the difference With default transaction settings thisUPDATE takes a very long time - 47 seconds - compared to only 7210 ms with NO SAVE-POINT enabled With default transaction settings we can see that memory usage is signifi-cant wherease with NO SAVEPOINT no additional memory is used
The third and all following UPDATE statements with default settings show equal time andmemory usage values and the growth of writes parameters
Figure 2
Memory usage while performing UPDATEs with and without NO SAVEPOINT
With NO SAVEPOINT usage we observe that timememory values and writes growth areall small and virtually equal for each pass The corresponding graphs are below
Inside the UNDO logSo what happened during the execution of the test scripts What is the secret behind thismagic that NO SAVEPOINT does Is there any cost attached to these gains
A few words about versions
You probably know already that InterBase is a multi-record-version database enginemeaning that each time a record is changed a new version of that record is produced The
TestBed
19
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
old version does not disappearimmediately but is retained as abackversion
In fact the first time a backversion iswritten to disk it is as a delta ver-sion which saves disk space andmemory usage by writing out onlythe differences between the old andthe new versions The engine canrebuild the full old version from thenew version and chains of delta ver-sions It is only if the same record isupdated more than once within thesame transaction that the full back-version is written to disk
The UNDO log concept
You may recall from Dmitrirsquos articlehow each transaction is implicitlyenclosed in a frame of savepointseach having its own undo log Thislog stores a record of each changein sequence ready for the possibili-ty that a rollback will be requested
A backversion materializes when-ever an UPDATE or DELETE state-ment is performed The engine hasto maintain all these backversions inthe undo log for the relevant save-point
So the Undo Log is a mechanism tomanage backversions for save-points in order to enable the associ-ated changes to be rolled back Theprocess of Undo logging is quitecomplex and maintaining it canconsume a lot of resources
The NO SAVEPOINToption
The NO SAVEPOINT option inInterBase 751 is a workaround forthe problem of performance lossduring bulk updates that do multiplepasses of a table The theory is ifusing the implicit savepoint man-agement causes problems then letskill the savepoints No savepoints ndashno problem -)
Besides ISQL it has been surfacedas a transaction parameter in bothDSQL and ESQL At the API level anew transaction parameter block(TPB) option isc_tpb_no_savepointcan be passed in the isc_start_trans-action() function call to disablesavepoints management Syntaxdetails for the latter flavors and forthe new tpb option can be found inthe 751 release notes
The effect of specifying the NOSAVEPOINT transaction parameteris that no undo log will be createdHowever along with the perform-ance gain for sequential bulkupdates it brings some costs fortransaction management
First and most obvious is that withNO SAVEPOINT enabled anyerror handling that relies on save-points is unavailable Any errorduring a NO SAVEPOINT transac-tion precludes all subsequent exe-cution and leads to rollback (see
ldquoRelease Notesrdquo for InterBase 75 SP1 ldquoNew in InterBase 751rdquo page 2-2)
Secondly when a NO SAVEPOINT transaction is rolled back it is marked as rolled back in the transactioninventory page Record version garbage thereby gets stuck in the interesting category and prevents theOIT from advancing Sweep is needed to advance the OIT and back out dead record versions
Fuller details of the NO SAVEPOINT option are provided in the InterBase 751 Release Notes
Initial situation
Consider the implementation details of the undo-log Figure 3 shows the initial situation
Recall that we perform this test on freshly-restored database so it is guaranteed that only one version existsfor any record
Figure 3
Initial situation before any UPDATE - only the one record version exists Undo log is empty
The first UPDATE
The first UPDATE statement creates delta backversions on disk (see figure 4) Since deltas store only thedifferences between the old and new versions they are quite small This operation is fast and it is easywork for the memory manager
It is simple to visualize the undo log when we perform the first UPDATEDELETE statement inside the trans-action ndash the engine just records the numbers of all affected records into the bitmap structure If it needs toroll back the changes associated with this savepoint it can read the stored numbers of the affected recordsthen walk down to the version of each and restore it from the updated version and the backversion storedon disk
TestBed
20
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
Figure 4
UPDATE1 create small delta version on disk and put record number into UNDO log
This approach is very fast and economical on memory usage The enginedoes not waste too many resources to handle this undo log ndash in fact it reusesthe existing multi-versioning mechanism Resource consumption is merelythe memory used to store the bitmap structure with the backversion num-bers We dont see any significant difference here between a transactionwith the default settings and one with the NO SAVEPOINT option enabled
The second UPDATE
When the second UPDATE statement is performed on the same set of records we have adifferent situation
Here is a good place to note that the example we are considering is the most simple situa-tion where only the one global (transaction level) savepoint exists We will also look at thedifference in the Undo log when an explicit (or enclosed BEGINhellip END) savepoint is used
To preserve on-disk integrity (remember the lsquocareful writersquo principle ) the engine mustcompute a new delta between the old version (by transaction1) and new version (by trans-action2 update2) store it somewhere on disk fetch the current full version (by transac-tion2 update1) put it into the in-memory undo-log replace it with the new full version (withbackpointers set to the newly created delta) and erase the old now superseded delta Asyou can see ndash there is much more work to do both on disk and in memory
The engine could write all intermediate versions to disk but there is no reason to do soThese versions are visible only to the modifying transaction and would not be used unlessa rollback was required
Figure 5
The second UPDATE creates a new delta backversion for transaction 1erases from disk the delta version created by the first UPDATE and copiesthe version from UPDATE1 into the Undo log
This all makes hard work for the memory manager and the CPU as you can see from thegrowth of the ldquomax memrdquordquodelta memrdquo parameters values in the test that uses the defaulttransaction parameters
When NO SAVEPOINT is enabled we avoid the expense of maintaining the Undo log Asa result we see execution time readswrites and memory usage as low for subsequentupdates as for the first
The third UPDATE
The third and all subsequent UPDATEs are similar to the second UPDATE with one excep-tion ndash memory usage does not grow any further
TestBed
21
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
Figure 6
The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2version and its own version is written to disk as the latest one
Figure 7
If we have an explicit SAVEPOINT each new record version associated with it will have acorresponding backversion in the Undo log of that savepoint
Original design of IB implement second UPDATE another way but sometime after IB6 Bor-land changed original behavior and we see what we see nowBut this theme is for anotherarticle)
Why is the delta of memory usage zero The reason is that beyond the second UPDATEno record version is created From here on the update just replaces record data on diskwith the newest one and shifts the superseded version into the Undo log
A more interesting question is why we see an increase in disk reads and writes during thetest We would have expected that the third and following UPDATEs would do essentiallyequal numbers of read and writes to write the newest versions and move the previous onesto the undo log However we are actually seeing a growing count of writes We have noanswer for it but we would be pleased to know
The following figure (figure 6) helps to illustrate the situation in the Undo log during thesequential updates When NO SAVEPOINT is enabled the only pieces we need to performare replacing the version on disk and updating the original backversion It is fast as the firstUPDATE
Explicit SAVEPOINTWhen an UPDATE statement is going to be performed within its own explicit or implicit
BEGIN END savepoint framework the engine has to store a backversion for each associ-ated record version in the Undo log
For example if we used an explicit savepoint eg SAVEPOINT Savepoint1 upon perform-ing UPDATE2 we would have the situation illustrated in figure 7
In this case the memory consumption would be expected to increase each time an UPDATEoccurs within the explicit savepoints scope
SummaryThe new transaction option NO SAVEPOINT can solve the problem of excessive resourceusage growth that can occur with sequential bulk updates It should be beneficial whenapplied appropriately Because the option can create its own problems by inhibiting theadvance of the OIT it should be used with caution of course The developer will need to takeextra care about database housekeeping particularly with respect to timely sweeping
Development area
22
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article
The sources in the bibliography arelisted in the order of their appear-ance in my mind
The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability
of all the methods taken from thearticles and of course I have testedall my own methods
Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise
I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc
And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle
The problem state-ment
What is the problem
Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common
1Everyone got used to them andfelt comfortable with them
2They are quite fast (if you usethem according to certain knownstandards)
3They use SQL which is an easycomprehensible and time-proveddata manipulation method
4They are based upon a strongmathematical theory
5They are convenient for applica-
tion development - if you developyour applications just like 20-30years ago
As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and
Author Vladimir Kotlyarevskyvladcontekru
Development area
23
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip
As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored
What do we need
The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing
RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task
The OID
All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it
First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have
any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database
OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex
Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)
Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world
And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones
Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles
The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers
ClassId
All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is
Development area
24
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain
Name Description cre-ation_date change_dateowner
Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect
links is often a very complicatedtask to put it mildly
Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later
The OBJECTS Table
Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure
So let us design this table
Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)
The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)
Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method
Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage
There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student
create domain TOID as integer not nullcreate table OBJECTS(
OID TOID primary keyClassId TOIDName varchar(32)
A simple plain dictionary can be retrieved from the OBJECTStable by the following query
select OID Name Description from OBJECTS where ClassId = LookupId
Development area
25
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description
from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)
Later I will demonstrate how to add alittle more intelligence to such simpledictionaries
Storing of more complexobjects
It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping
Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders
create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))
which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to
ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query
select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id
As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did
If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M
create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))
One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present
Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database
Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this
create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))
connected 1M with OBJECTS by OID There is also a table
create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))
which describes type metadata ndash an attribute set (their names and types) for eachobject type
All attributes for particular object where the OID is known are retrieved by the query
select attribute_id value from attributeswhere OID = oid
or with names of attributes
select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id
Development area
26
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries
select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id
Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query
Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the
database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base
The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1
Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage
Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB
It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-
utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation
To be continuedhellip
FastReport 3 - new generation of the reporting tools
Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix
reports support support most popular DB-engines
Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags
(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)
Access to DB fields styles text flow URLs Anchors
httpwwwfast-reportcom
Development area
27
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
Replicating and synchronizingInterBaseFirebirddatabases using CopyCat
AuthorJonathan Neve
jonathanmicrotecfr
wwwmicrotecfr
PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication
Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases
Data logging
Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)
Multi-node replication
Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after
In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it
replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)
When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it
Two-way replication
One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed
The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change
Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself
Primary key synchronization
One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-
Development area
28
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values
Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server
When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique
Conflict management
Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-
sion of the record this is a conflict
CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved
This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently
Difficult database structures
There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node
Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged
To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an
Development area
29
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable
PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms
1 As a set of Delphi C++ Builder components
2 As a standalone Win32 replication tool
We will now present each one of these products
1 CopyCat Delphi C++ Builder Component Set
CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions
Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server
Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table
Below is a concise guide for getting started with the CopyCat component suite
Download and install the evaluation components fromhttpwwwmicrotecfrcopycat
Prepare databases
1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components
2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)
3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo
4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base
5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo
Development area
30
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data
6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority
7) Apply all the generated SQL to all databases that should be replicated
8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)
Replicate
1 In Delphi open the ldquoReplicatorrdquo example project
2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property
3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases
4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)
5 Compile and run the example
6 Press the ldquoReplicate nowrdquo button
2 CopyTiger the CopyCat Win32standalone replication tool
For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER
Features include
bull Easy to use installer
bull Independent server Administrator tool (CTAdmin)
bull Configuration wizard for setting up links to master slave databases
bull Robust replication engine based on Microtec CopyCat
bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections
bull Simple amp intuitive control panel
bullAutomatic email notification on certain events (conflicts PK violations etc)
Visit the CopyTiger homepage to download a time-limited trial version
httpwwwmicrotecfrcopycatct
SUMMARY
In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit
1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications
2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure
CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool
You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr
As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold
Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code
ibd-852-150
Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems
Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases
Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the
Development area
31
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance
Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets
Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained
database statistics from time to time
Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and
indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports
The summary shown in Figure 1provides general information about
Figure 1 Summary of database statistics
Author Dmitri Kouzmenkokdvib-aidcom
1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction
Development area
32
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases
Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)
As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed
What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay
Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be
stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF
Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure
Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1
Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint
We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base
bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment
bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions
bull The Oldest active3 ndash the oldestcurrently active transaction
bull The Next transaction ndash thetransaction number that will beassigned to new transaction
bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell
if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources
bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset
As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem
It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless
Do not gather statistics if you
bull Just restored your database
bull Performed backup (gbak ndashb dbgdb) without the ndashg switch
bull Recently performed a manual sweep (gfix ndashsweep)
Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)
Development area
33
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw
The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data
For example look at ADDR_ADDRESS_IDX6 First of all the index
How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)
The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most
multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments
Table informationLets take look at another sampleoutput from IBAnalyst
The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics
ly and what the table size is inmegabytes Most of these warningsare customizable
In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records
Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics
were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could
bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway
Development area
34
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20
bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column
That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics
Reports
There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded
As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size
Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-
Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144
ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3
SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance
Readers feedback
35
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Comments to ldquoTemporary tablesrdquo
One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence
Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird
This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience
Then there are situations where theoptimizer simply loses the plot
because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans
Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem
The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp
and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample
The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)
We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is
dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably
Well to be fair Firebird developersare working on both topics
Volker Rehn
volkerrehnbigpondcom
We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it
Alexey Kovyazin Chief Editor
Readers feedback
Miscellaneous
36
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Invitations SubscriptionDonations Sponsorships
This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000
Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance
Subscribe now
To receive future issuesnotifications send email to
subscribeibdevelopercom
MagazineCD
Donations
TestBed
18
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
For convenience the results are tabulated above
The first UPDATE statement has almost the same execution time with and without the NOSAVEPOINT option However memory consumption is reduced fivefold when we use NOSAVEPOINT
Figure 1
Time to perform UPDATEs with and without NO SAVEPOINT
Table 1
Test results for 5 sequental UPDATEs
In the second UPDATE we start to see the difference With default transaction settings thisUPDATE takes a very long time - 47 seconds - compared to only 7210 ms with NO SAVE-POINT enabled With default transaction settings we can see that memory usage is signifi-cant wherease with NO SAVEPOINT no additional memory is used
The third and all following UPDATE statements with default settings show equal time andmemory usage values and the growth of writes parameters
Figure 2
Memory usage while performing UPDATEs with and without NO SAVEPOINT
With NO SAVEPOINT usage we observe that timememory values and writes growth areall small and virtually equal for each pass The corresponding graphs are below
Inside the UNDO logSo what happened during the execution of the test scripts What is the secret behind thismagic that NO SAVEPOINT does Is there any cost attached to these gains
A few words about versions
You probably know already that InterBase is a multi-record-version database enginemeaning that each time a record is changed a new version of that record is produced The
TestBed
19
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
old version does not disappearimmediately but is retained as abackversion
In fact the first time a backversion iswritten to disk it is as a delta ver-sion which saves disk space andmemory usage by writing out onlythe differences between the old andthe new versions The engine canrebuild the full old version from thenew version and chains of delta ver-sions It is only if the same record isupdated more than once within thesame transaction that the full back-version is written to disk
The UNDO log concept
You may recall from Dmitrirsquos articlehow each transaction is implicitlyenclosed in a frame of savepointseach having its own undo log Thislog stores a record of each changein sequence ready for the possibili-ty that a rollback will be requested
A backversion materializes when-ever an UPDATE or DELETE state-ment is performed The engine hasto maintain all these backversions inthe undo log for the relevant save-point
So the Undo Log is a mechanism tomanage backversions for save-points in order to enable the associ-ated changes to be rolled back Theprocess of Undo logging is quitecomplex and maintaining it canconsume a lot of resources
The NO SAVEPOINToption
The NO SAVEPOINT option inInterBase 751 is a workaround forthe problem of performance lossduring bulk updates that do multiplepasses of a table The theory is ifusing the implicit savepoint man-agement causes problems then letskill the savepoints No savepoints ndashno problem -)
Besides ISQL it has been surfacedas a transaction parameter in bothDSQL and ESQL At the API level anew transaction parameter block(TPB) option isc_tpb_no_savepointcan be passed in the isc_start_trans-action() function call to disablesavepoints management Syntaxdetails for the latter flavors and forthe new tpb option can be found inthe 751 release notes
The effect of specifying the NOSAVEPOINT transaction parameteris that no undo log will be createdHowever along with the perform-ance gain for sequential bulkupdates it brings some costs fortransaction management
First and most obvious is that withNO SAVEPOINT enabled anyerror handling that relies on save-points is unavailable Any errorduring a NO SAVEPOINT transac-tion precludes all subsequent exe-cution and leads to rollback (see
ldquoRelease Notesrdquo for InterBase 75 SP1 ldquoNew in InterBase 751rdquo page 2-2)
Secondly when a NO SAVEPOINT transaction is rolled back it is marked as rolled back in the transactioninventory page Record version garbage thereby gets stuck in the interesting category and prevents theOIT from advancing Sweep is needed to advance the OIT and back out dead record versions
Fuller details of the NO SAVEPOINT option are provided in the InterBase 751 Release Notes
Initial situation
Consider the implementation details of the undo-log Figure 3 shows the initial situation
Recall that we perform this test on freshly-restored database so it is guaranteed that only one version existsfor any record
Figure 3
Initial situation before any UPDATE - only the one record version exists Undo log is empty
The first UPDATE
The first UPDATE statement creates delta backversions on disk (see figure 4) Since deltas store only thedifferences between the old and new versions they are quite small This operation is fast and it is easywork for the memory manager
It is simple to visualize the undo log when we perform the first UPDATEDELETE statement inside the trans-action ndash the engine just records the numbers of all affected records into the bitmap structure If it needs toroll back the changes associated with this savepoint it can read the stored numbers of the affected recordsthen walk down to the version of each and restore it from the updated version and the backversion storedon disk
TestBed
20
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
Figure 4
UPDATE1 create small delta version on disk and put record number into UNDO log
This approach is very fast and economical on memory usage The enginedoes not waste too many resources to handle this undo log ndash in fact it reusesthe existing multi-versioning mechanism Resource consumption is merelythe memory used to store the bitmap structure with the backversion num-bers We dont see any significant difference here between a transactionwith the default settings and one with the NO SAVEPOINT option enabled
The second UPDATE
When the second UPDATE statement is performed on the same set of records we have adifferent situation
Here is a good place to note that the example we are considering is the most simple situa-tion where only the one global (transaction level) savepoint exists We will also look at thedifference in the Undo log when an explicit (or enclosed BEGINhellip END) savepoint is used
To preserve on-disk integrity (remember the lsquocareful writersquo principle ) the engine mustcompute a new delta between the old version (by transaction1) and new version (by trans-action2 update2) store it somewhere on disk fetch the current full version (by transac-tion2 update1) put it into the in-memory undo-log replace it with the new full version (withbackpointers set to the newly created delta) and erase the old now superseded delta Asyou can see ndash there is much more work to do both on disk and in memory
The engine could write all intermediate versions to disk but there is no reason to do soThese versions are visible only to the modifying transaction and would not be used unlessa rollback was required
Figure 5
The second UPDATE creates a new delta backversion for transaction 1erases from disk the delta version created by the first UPDATE and copiesthe version from UPDATE1 into the Undo log
This all makes hard work for the memory manager and the CPU as you can see from thegrowth of the ldquomax memrdquordquodelta memrdquo parameters values in the test that uses the defaulttransaction parameters
When NO SAVEPOINT is enabled we avoid the expense of maintaining the Undo log Asa result we see execution time readswrites and memory usage as low for subsequentupdates as for the first
The third UPDATE
The third and all subsequent UPDATEs are similar to the second UPDATE with one excep-tion ndash memory usage does not grow any further
TestBed
21
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
Figure 6
The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2version and its own version is written to disk as the latest one
Figure 7
If we have an explicit SAVEPOINT each new record version associated with it will have acorresponding backversion in the Undo log of that savepoint
Original design of IB implement second UPDATE another way but sometime after IB6 Bor-land changed original behavior and we see what we see nowBut this theme is for anotherarticle)
Why is the delta of memory usage zero The reason is that beyond the second UPDATEno record version is created From here on the update just replaces record data on diskwith the newest one and shifts the superseded version into the Undo log
A more interesting question is why we see an increase in disk reads and writes during thetest We would have expected that the third and following UPDATEs would do essentiallyequal numbers of read and writes to write the newest versions and move the previous onesto the undo log However we are actually seeing a growing count of writes We have noanswer for it but we would be pleased to know
The following figure (figure 6) helps to illustrate the situation in the Undo log during thesequential updates When NO SAVEPOINT is enabled the only pieces we need to performare replacing the version on disk and updating the original backversion It is fast as the firstUPDATE
Explicit SAVEPOINTWhen an UPDATE statement is going to be performed within its own explicit or implicit
BEGIN END savepoint framework the engine has to store a backversion for each associ-ated record version in the Undo log
For example if we used an explicit savepoint eg SAVEPOINT Savepoint1 upon perform-ing UPDATE2 we would have the situation illustrated in figure 7
In this case the memory consumption would be expected to increase each time an UPDATEoccurs within the explicit savepoints scope
SummaryThe new transaction option NO SAVEPOINT can solve the problem of excessive resourceusage growth that can occur with sequential bulk updates It should be beneficial whenapplied appropriately Because the option can create its own problems by inhibiting theadvance of the OIT it should be used with caution of course The developer will need to takeextra care about database housekeeping particularly with respect to timely sweeping
Development area
22
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article
The sources in the bibliography arelisted in the order of their appear-ance in my mind
The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability
of all the methods taken from thearticles and of course I have testedall my own methods
Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise
I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc
And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle
The problem state-ment
What is the problem
Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common
1Everyone got used to them andfelt comfortable with them
2They are quite fast (if you usethem according to certain knownstandards)
3They use SQL which is an easycomprehensible and time-proveddata manipulation method
4They are based upon a strongmathematical theory
5They are convenient for applica-
tion development - if you developyour applications just like 20-30years ago
As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and
Author Vladimir Kotlyarevskyvladcontekru
Development area
23
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip
As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored
What do we need
The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing
RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task
The OID
All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it
First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have
any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database
OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex
Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)
Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world
And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones
Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles
The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers
ClassId
All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is
Development area
24
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain
Name Description cre-ation_date change_dateowner
Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect
links is often a very complicatedtask to put it mildly
Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later
The OBJECTS Table
Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure
So let us design this table
Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)
The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)
Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method
Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage
There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student
create domain TOID as integer not nullcreate table OBJECTS(
OID TOID primary keyClassId TOIDName varchar(32)
A simple plain dictionary can be retrieved from the OBJECTStable by the following query
select OID Name Description from OBJECTS where ClassId = LookupId
Development area
25
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description
from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)
Later I will demonstrate how to add alittle more intelligence to such simpledictionaries
Storing of more complexobjects
It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping
Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders
create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))
which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to
ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query
select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id
As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did
If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M
create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))
One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present
Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database
Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this
create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))
connected 1M with OBJECTS by OID There is also a table
create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))
which describes type metadata ndash an attribute set (their names and types) for eachobject type
All attributes for particular object where the OID is known are retrieved by the query
select attribute_id value from attributeswhere OID = oid
or with names of attributes
select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id
Development area
26
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries
select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id
Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query
Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the
database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base
The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1
Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage
Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB
It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-
utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation
To be continuedhellip
FastReport 3 - new generation of the reporting tools
Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix
reports support support most popular DB-engines
Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags
(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)
Access to DB fields styles text flow URLs Anchors
httpwwwfast-reportcom
Development area
27
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
Replicating and synchronizingInterBaseFirebirddatabases using CopyCat
AuthorJonathan Neve
jonathanmicrotecfr
wwwmicrotecfr
PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication
Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases
Data logging
Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)
Multi-node replication
Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after
In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it
replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)
When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it
Two-way replication
One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed
The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change
Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself
Primary key synchronization
One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-
Development area
28
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values
Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server
When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique
Conflict management
Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-
sion of the record this is a conflict
CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved
This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently
Difficult database structures
There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node
Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged
To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an
Development area
29
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable
PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms
1 As a set of Delphi C++ Builder components
2 As a standalone Win32 replication tool
We will now present each one of these products
1 CopyCat Delphi C++ Builder Component Set
CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions
Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server
Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table
Below is a concise guide for getting started with the CopyCat component suite
Download and install the evaluation components fromhttpwwwmicrotecfrcopycat
Prepare databases
1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components
2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)
3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo
4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base
5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo
Development area
30
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data
6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority
7) Apply all the generated SQL to all databases that should be replicated
8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)
Replicate
1 In Delphi open the ldquoReplicatorrdquo example project
2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property
3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases
4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)
5 Compile and run the example
6 Press the ldquoReplicate nowrdquo button
2 CopyTiger the CopyCat Win32standalone replication tool
For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER
Features include
bull Easy to use installer
bull Independent server Administrator tool (CTAdmin)
bull Configuration wizard for setting up links to master slave databases
bull Robust replication engine based on Microtec CopyCat
bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections
bull Simple amp intuitive control panel
bullAutomatic email notification on certain events (conflicts PK violations etc)
Visit the CopyTiger homepage to download a time-limited trial version
httpwwwmicrotecfrcopycatct
SUMMARY
In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit
1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications
2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure
CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool
You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr
As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold
Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code
ibd-852-150
Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems
Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases
Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the
Development area
31
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance
Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets
Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained
database statistics from time to time
Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and
indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports
The summary shown in Figure 1provides general information about
Figure 1 Summary of database statistics
Author Dmitri Kouzmenkokdvib-aidcom
1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction
Development area
32
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases
Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)
As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed
What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay
Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be
stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF
Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure
Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1
Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint
We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base
bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment
bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions
bull The Oldest active3 ndash the oldestcurrently active transaction
bull The Next transaction ndash thetransaction number that will beassigned to new transaction
bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell
if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources
bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset
As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem
It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless
Do not gather statistics if you
bull Just restored your database
bull Performed backup (gbak ndashb dbgdb) without the ndashg switch
bull Recently performed a manual sweep (gfix ndashsweep)
Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)
Development area
33
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw
The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data
For example look at ADDR_ADDRESS_IDX6 First of all the index
How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)
The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most
multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments
Table informationLets take look at another sampleoutput from IBAnalyst
The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics
ly and what the table size is inmegabytes Most of these warningsare customizable
In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records
Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics
were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could
bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway
Development area
34
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20
bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column
That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics
Reports
There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded
As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size
Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-
Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144
ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3
SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance
Readers feedback
35
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Comments to ldquoTemporary tablesrdquo
One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence
Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird
This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience
Then there are situations where theoptimizer simply loses the plot
because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans
Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem
The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp
and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample
The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)
We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is
dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably
Well to be fair Firebird developersare working on both topics
Volker Rehn
volkerrehnbigpondcom
We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it
Alexey Kovyazin Chief Editor
Readers feedback
Miscellaneous
36
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Invitations SubscriptionDonations Sponsorships
This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000
Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance
Subscribe now
To receive future issuesnotifications send email to
subscribeibdevelopercom
MagazineCD
Donations
TestBed
19
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
old version does not disappearimmediately but is retained as abackversion
In fact the first time a backversion iswritten to disk it is as a delta ver-sion which saves disk space andmemory usage by writing out onlythe differences between the old andthe new versions The engine canrebuild the full old version from thenew version and chains of delta ver-sions It is only if the same record isupdated more than once within thesame transaction that the full back-version is written to disk
The UNDO log concept
You may recall from Dmitrirsquos articlehow each transaction is implicitlyenclosed in a frame of savepointseach having its own undo log Thislog stores a record of each changein sequence ready for the possibili-ty that a rollback will be requested
A backversion materializes when-ever an UPDATE or DELETE state-ment is performed The engine hasto maintain all these backversions inthe undo log for the relevant save-point
So the Undo Log is a mechanism tomanage backversions for save-points in order to enable the associ-ated changes to be rolled back Theprocess of Undo logging is quitecomplex and maintaining it canconsume a lot of resources
The NO SAVEPOINToption
The NO SAVEPOINT option inInterBase 751 is a workaround forthe problem of performance lossduring bulk updates that do multiplepasses of a table The theory is ifusing the implicit savepoint man-agement causes problems then letskill the savepoints No savepoints ndashno problem -)
Besides ISQL it has been surfacedas a transaction parameter in bothDSQL and ESQL At the API level anew transaction parameter block(TPB) option isc_tpb_no_savepointcan be passed in the isc_start_trans-action() function call to disablesavepoints management Syntaxdetails for the latter flavors and forthe new tpb option can be found inthe 751 release notes
The effect of specifying the NOSAVEPOINT transaction parameteris that no undo log will be createdHowever along with the perform-ance gain for sequential bulkupdates it brings some costs fortransaction management
First and most obvious is that withNO SAVEPOINT enabled anyerror handling that relies on save-points is unavailable Any errorduring a NO SAVEPOINT transac-tion precludes all subsequent exe-cution and leads to rollback (see
ldquoRelease Notesrdquo for InterBase 75 SP1 ldquoNew in InterBase 751rdquo page 2-2)
Secondly when a NO SAVEPOINT transaction is rolled back it is marked as rolled back in the transactioninventory page Record version garbage thereby gets stuck in the interesting category and prevents theOIT from advancing Sweep is needed to advance the OIT and back out dead record versions
Fuller details of the NO SAVEPOINT option are provided in the InterBase 751 Release Notes
Initial situation
Consider the implementation details of the undo-log Figure 3 shows the initial situation
Recall that we perform this test on freshly-restored database so it is guaranteed that only one version existsfor any record
Figure 3
Initial situation before any UPDATE - only the one record version exists Undo log is empty
The first UPDATE
The first UPDATE statement creates delta backversions on disk (see figure 4) Since deltas store only thedifferences between the old and new versions they are quite small This operation is fast and it is easywork for the memory manager
It is simple to visualize the undo log when we perform the first UPDATEDELETE statement inside the trans-action ndash the engine just records the numbers of all affected records into the bitmap structure If it needs toroll back the changes associated with this savepoint it can read the stored numbers of the affected recordsthen walk down to the version of each and restore it from the updated version and the backversion storedon disk
TestBed
20
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
Figure 4
UPDATE1 create small delta version on disk and put record number into UNDO log
This approach is very fast and economical on memory usage The enginedoes not waste too many resources to handle this undo log ndash in fact it reusesthe existing multi-versioning mechanism Resource consumption is merelythe memory used to store the bitmap structure with the backversion num-bers We dont see any significant difference here between a transactionwith the default settings and one with the NO SAVEPOINT option enabled
The second UPDATE
When the second UPDATE statement is performed on the same set of records we have adifferent situation
Here is a good place to note that the example we are considering is the most simple situa-tion where only the one global (transaction level) savepoint exists We will also look at thedifference in the Undo log when an explicit (or enclosed BEGINhellip END) savepoint is used
To preserve on-disk integrity (remember the lsquocareful writersquo principle ) the engine mustcompute a new delta between the old version (by transaction1) and new version (by trans-action2 update2) store it somewhere on disk fetch the current full version (by transac-tion2 update1) put it into the in-memory undo-log replace it with the new full version (withbackpointers set to the newly created delta) and erase the old now superseded delta Asyou can see ndash there is much more work to do both on disk and in memory
The engine could write all intermediate versions to disk but there is no reason to do soThese versions are visible only to the modifying transaction and would not be used unlessa rollback was required
Figure 5
The second UPDATE creates a new delta backversion for transaction 1erases from disk the delta version created by the first UPDATE and copiesthe version from UPDATE1 into the Undo log
This all makes hard work for the memory manager and the CPU as you can see from thegrowth of the ldquomax memrdquordquodelta memrdquo parameters values in the test that uses the defaulttransaction parameters
When NO SAVEPOINT is enabled we avoid the expense of maintaining the Undo log Asa result we see execution time readswrites and memory usage as low for subsequentupdates as for the first
The third UPDATE
The third and all subsequent UPDATEs are similar to the second UPDATE with one excep-tion ndash memory usage does not grow any further
TestBed
21
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
Figure 6
The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2version and its own version is written to disk as the latest one
Figure 7
If we have an explicit SAVEPOINT each new record version associated with it will have acorresponding backversion in the Undo log of that savepoint
Original design of IB implement second UPDATE another way but sometime after IB6 Bor-land changed original behavior and we see what we see nowBut this theme is for anotherarticle)
Why is the delta of memory usage zero The reason is that beyond the second UPDATEno record version is created From here on the update just replaces record data on diskwith the newest one and shifts the superseded version into the Undo log
A more interesting question is why we see an increase in disk reads and writes during thetest We would have expected that the third and following UPDATEs would do essentiallyequal numbers of read and writes to write the newest versions and move the previous onesto the undo log However we are actually seeing a growing count of writes We have noanswer for it but we would be pleased to know
The following figure (figure 6) helps to illustrate the situation in the Undo log during thesequential updates When NO SAVEPOINT is enabled the only pieces we need to performare replacing the version on disk and updating the original backversion It is fast as the firstUPDATE
Explicit SAVEPOINTWhen an UPDATE statement is going to be performed within its own explicit or implicit
BEGIN END savepoint framework the engine has to store a backversion for each associ-ated record version in the Undo log
For example if we used an explicit savepoint eg SAVEPOINT Savepoint1 upon perform-ing UPDATE2 we would have the situation illustrated in figure 7
In this case the memory consumption would be expected to increase each time an UPDATEoccurs within the explicit savepoints scope
SummaryThe new transaction option NO SAVEPOINT can solve the problem of excessive resourceusage growth that can occur with sequential bulk updates It should be beneficial whenapplied appropriately Because the option can create its own problems by inhibiting theadvance of the OIT it should be used with caution of course The developer will need to takeextra care about database housekeeping particularly with respect to timely sweeping
Development area
22
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article
The sources in the bibliography arelisted in the order of their appear-ance in my mind
The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability
of all the methods taken from thearticles and of course I have testedall my own methods
Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise
I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc
And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle
The problem state-ment
What is the problem
Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common
1Everyone got used to them andfelt comfortable with them
2They are quite fast (if you usethem according to certain knownstandards)
3They use SQL which is an easycomprehensible and time-proveddata manipulation method
4They are based upon a strongmathematical theory
5They are convenient for applica-
tion development - if you developyour applications just like 20-30years ago
As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and
Author Vladimir Kotlyarevskyvladcontekru
Development area
23
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip
As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored
What do we need
The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing
RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task
The OID
All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it
First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have
any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database
OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex
Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)
Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world
And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones
Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles
The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers
ClassId
All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is
Development area
24
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain
Name Description cre-ation_date change_dateowner
Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect
links is often a very complicatedtask to put it mildly
Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later
The OBJECTS Table
Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure
So let us design this table
Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)
The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)
Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method
Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage
There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student
create domain TOID as integer not nullcreate table OBJECTS(
OID TOID primary keyClassId TOIDName varchar(32)
A simple plain dictionary can be retrieved from the OBJECTStable by the following query
select OID Name Description from OBJECTS where ClassId = LookupId
Development area
25
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description
from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)
Later I will demonstrate how to add alittle more intelligence to such simpledictionaries
Storing of more complexobjects
It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping
Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders
create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))
which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to
ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query
select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id
As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did
If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M
create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))
One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present
Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database
Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this
create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))
connected 1M with OBJECTS by OID There is also a table
create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))
which describes type metadata ndash an attribute set (their names and types) for eachobject type
All attributes for particular object where the OID is known are retrieved by the query
select attribute_id value from attributeswhere OID = oid
or with names of attributes
select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id
Development area
26
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries
select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id
Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query
Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the
database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base
The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1
Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage
Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB
It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-
utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation
To be continuedhellip
FastReport 3 - new generation of the reporting tools
Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix
reports support support most popular DB-engines
Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags
(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)
Access to DB fields styles text flow URLs Anchors
httpwwwfast-reportcom
Development area
27
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
Replicating and synchronizingInterBaseFirebirddatabases using CopyCat
AuthorJonathan Neve
jonathanmicrotecfr
wwwmicrotecfr
PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication
Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases
Data logging
Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)
Multi-node replication
Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after
In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it
replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)
When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it
Two-way replication
One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed
The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change
Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself
Primary key synchronization
One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-
Development area
28
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values
Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server
When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique
Conflict management
Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-
sion of the record this is a conflict
CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved
This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently
Difficult database structures
There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node
Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged
To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an
Development area
29
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable
PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms
1 As a set of Delphi C++ Builder components
2 As a standalone Win32 replication tool
We will now present each one of these products
1 CopyCat Delphi C++ Builder Component Set
CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions
Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server
Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table
Below is a concise guide for getting started with the CopyCat component suite
Download and install the evaluation components fromhttpwwwmicrotecfrcopycat
Prepare databases
1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components
2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)
3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo
4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base
5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo
Development area
30
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data
6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority
7) Apply all the generated SQL to all databases that should be replicated
8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)
Replicate
1 In Delphi open the ldquoReplicatorrdquo example project
2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property
3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases
4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)
5 Compile and run the example
6 Press the ldquoReplicate nowrdquo button
2 CopyTiger the CopyCat Win32standalone replication tool
For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER
Features include
bull Easy to use installer
bull Independent server Administrator tool (CTAdmin)
bull Configuration wizard for setting up links to master slave databases
bull Robust replication engine based on Microtec CopyCat
bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections
bull Simple amp intuitive control panel
bullAutomatic email notification on certain events (conflicts PK violations etc)
Visit the CopyTiger homepage to download a time-limited trial version
httpwwwmicrotecfrcopycatct
SUMMARY
In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit
1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications
2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure
CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool
You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr
As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold
Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code
ibd-852-150
Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems
Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases
Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the
Development area
31
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance
Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets
Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained
database statistics from time to time
Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and
indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports
The summary shown in Figure 1provides general information about
Figure 1 Summary of database statistics
Author Dmitri Kouzmenkokdvib-aidcom
1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction
Development area
32
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases
Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)
As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed
What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay
Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be
stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF
Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure
Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1
Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint
We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base
bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment
bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions
bull The Oldest active3 ndash the oldestcurrently active transaction
bull The Next transaction ndash thetransaction number that will beassigned to new transaction
bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell
if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources
bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset
As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem
It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless
Do not gather statistics if you
bull Just restored your database
bull Performed backup (gbak ndashb dbgdb) without the ndashg switch
bull Recently performed a manual sweep (gfix ndashsweep)
Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)
Development area
33
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw
The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data
For example look at ADDR_ADDRESS_IDX6 First of all the index
How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)
The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most
multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments
Table informationLets take look at another sampleoutput from IBAnalyst
The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics
ly and what the table size is inmegabytes Most of these warningsare customizable
In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records
Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics
were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could
bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway
Development area
34
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20
bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column
That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics
Reports
There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded
As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size
Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-
Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144
ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3
SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance
Readers feedback
35
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Comments to ldquoTemporary tablesrdquo
One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence
Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird
This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience
Then there are situations where theoptimizer simply loses the plot
because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans
Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem
The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp
and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample
The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)
We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is
dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably
Well to be fair Firebird developersare working on both topics
Volker Rehn
volkerrehnbigpondcom
We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it
Alexey Kovyazin Chief Editor
Readers feedback
Miscellaneous
36
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Invitations SubscriptionDonations Sponsorships
This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000
Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance
Subscribe now
To receive future issuesnotifications send email to
subscribeibdevelopercom
MagazineCD
Donations
TestBed
20
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
Figure 4
UPDATE1 create small delta version on disk and put record number into UNDO log
This approach is very fast and economical on memory usage The enginedoes not waste too many resources to handle this undo log ndash in fact it reusesthe existing multi-versioning mechanism Resource consumption is merelythe memory used to store the bitmap structure with the backversion num-bers We dont see any significant difference here between a transactionwith the default settings and one with the NO SAVEPOINT option enabled
The second UPDATE
When the second UPDATE statement is performed on the same set of records we have adifferent situation
Here is a good place to note that the example we are considering is the most simple situa-tion where only the one global (transaction level) savepoint exists We will also look at thedifference in the Undo log when an explicit (or enclosed BEGINhellip END) savepoint is used
To preserve on-disk integrity (remember the lsquocareful writersquo principle ) the engine mustcompute a new delta between the old version (by transaction1) and new version (by trans-action2 update2) store it somewhere on disk fetch the current full version (by transac-tion2 update1) put it into the in-memory undo-log replace it with the new full version (withbackpointers set to the newly created delta) and erase the old now superseded delta Asyou can see ndash there is much more work to do both on disk and in memory
The engine could write all intermediate versions to disk but there is no reason to do soThese versions are visible only to the modifying transaction and would not be used unlessa rollback was required
Figure 5
The second UPDATE creates a new delta backversion for transaction 1erases from disk the delta version created by the first UPDATE and copiesthe version from UPDATE1 into the Undo log
This all makes hard work for the memory manager and the CPU as you can see from thegrowth of the ldquomax memrdquordquodelta memrdquo parameters values in the test that uses the defaulttransaction parameters
When NO SAVEPOINT is enabled we avoid the expense of maintaining the Undo log Asa result we see execution time readswrites and memory usage as low for subsequentupdates as for the first
The third UPDATE
The third and all subsequent UPDATEs are similar to the second UPDATE with one excep-tion ndash memory usage does not grow any further
TestBed
21
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
Figure 6
The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2version and its own version is written to disk as the latest one
Figure 7
If we have an explicit SAVEPOINT each new record version associated with it will have acorresponding backversion in the Undo log of that savepoint
Original design of IB implement second UPDATE another way but sometime after IB6 Bor-land changed original behavior and we see what we see nowBut this theme is for anotherarticle)
Why is the delta of memory usage zero The reason is that beyond the second UPDATEno record version is created From here on the update just replaces record data on diskwith the newest one and shifts the superseded version into the Undo log
A more interesting question is why we see an increase in disk reads and writes during thetest We would have expected that the third and following UPDATEs would do essentiallyequal numbers of read and writes to write the newest versions and move the previous onesto the undo log However we are actually seeing a growing count of writes We have noanswer for it but we would be pleased to know
The following figure (figure 6) helps to illustrate the situation in the Undo log during thesequential updates When NO SAVEPOINT is enabled the only pieces we need to performare replacing the version on disk and updating the original backversion It is fast as the firstUPDATE
Explicit SAVEPOINTWhen an UPDATE statement is going to be performed within its own explicit or implicit
BEGIN END savepoint framework the engine has to store a backversion for each associ-ated record version in the Undo log
For example if we used an explicit savepoint eg SAVEPOINT Savepoint1 upon perform-ing UPDATE2 we would have the situation illustrated in figure 7
In this case the memory consumption would be expected to increase each time an UPDATEoccurs within the explicit savepoints scope
SummaryThe new transaction option NO SAVEPOINT can solve the problem of excessive resourceusage growth that can occur with sequential bulk updates It should be beneficial whenapplied appropriately Because the option can create its own problems by inhibiting theadvance of the OIT it should be used with caution of course The developer will need to takeextra care about database housekeeping particularly with respect to timely sweeping
Development area
22
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article
The sources in the bibliography arelisted in the order of their appear-ance in my mind
The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability
of all the methods taken from thearticles and of course I have testedall my own methods
Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise
I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc
And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle
The problem state-ment
What is the problem
Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common
1Everyone got used to them andfelt comfortable with them
2They are quite fast (if you usethem according to certain knownstandards)
3They use SQL which is an easycomprehensible and time-proveddata manipulation method
4They are based upon a strongmathematical theory
5They are convenient for applica-
tion development - if you developyour applications just like 20-30years ago
As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and
Author Vladimir Kotlyarevskyvladcontekru
Development area
23
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip
As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored
What do we need
The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing
RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task
The OID
All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it
First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have
any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database
OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex
Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)
Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world
And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones
Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles
The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers
ClassId
All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is
Development area
24
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain
Name Description cre-ation_date change_dateowner
Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect
links is often a very complicatedtask to put it mildly
Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later
The OBJECTS Table
Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure
So let us design this table
Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)
The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)
Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method
Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage
There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student
create domain TOID as integer not nullcreate table OBJECTS(
OID TOID primary keyClassId TOIDName varchar(32)
A simple plain dictionary can be retrieved from the OBJECTStable by the following query
select OID Name Description from OBJECTS where ClassId = LookupId
Development area
25
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description
from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)
Later I will demonstrate how to add alittle more intelligence to such simpledictionaries
Storing of more complexobjects
It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping
Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders
create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))
which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to
ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query
select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id
As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did
If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M
create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))
One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present
Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database
Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this
create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))
connected 1M with OBJECTS by OID There is also a table
create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))
which describes type metadata ndash an attribute set (their names and types) for eachobject type
All attributes for particular object where the OID is known are retrieved by the query
select attribute_id value from attributeswhere OID = oid
or with names of attributes
select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id
Development area
26
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries
select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id
Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query
Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the
database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base
The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1
Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage
Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB
It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-
utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation
To be continuedhellip
FastReport 3 - new generation of the reporting tools
Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix
reports support support most popular DB-engines
Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags
(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)
Access to DB fields styles text flow URLs Anchors
httpwwwfast-reportcom
Development area
27
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
Replicating and synchronizingInterBaseFirebirddatabases using CopyCat
AuthorJonathan Neve
jonathanmicrotecfr
wwwmicrotecfr
PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication
Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases
Data logging
Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)
Multi-node replication
Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after
In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it
replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)
When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it
Two-way replication
One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed
The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change
Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself
Primary key synchronization
One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-
Development area
28
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values
Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server
When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique
Conflict management
Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-
sion of the record this is a conflict
CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved
This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently
Difficult database structures
There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node
Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged
To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an
Development area
29
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable
PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms
1 As a set of Delphi C++ Builder components
2 As a standalone Win32 replication tool
We will now present each one of these products
1 CopyCat Delphi C++ Builder Component Set
CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions
Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server
Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table
Below is a concise guide for getting started with the CopyCat component suite
Download and install the evaluation components fromhttpwwwmicrotecfrcopycat
Prepare databases
1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components
2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)
3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo
4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base
5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo
Development area
30
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data
6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority
7) Apply all the generated SQL to all databases that should be replicated
8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)
Replicate
1 In Delphi open the ldquoReplicatorrdquo example project
2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property
3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases
4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)
5 Compile and run the example
6 Press the ldquoReplicate nowrdquo button
2 CopyTiger the CopyCat Win32standalone replication tool
For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER
Features include
bull Easy to use installer
bull Independent server Administrator tool (CTAdmin)
bull Configuration wizard for setting up links to master slave databases
bull Robust replication engine based on Microtec CopyCat
bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections
bull Simple amp intuitive control panel
bullAutomatic email notification on certain events (conflicts PK violations etc)
Visit the CopyTiger homepage to download a time-limited trial version
httpwwwmicrotecfrcopycatct
SUMMARY
In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit
1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications
2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure
CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool
You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr
As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold
Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code
ibd-852-150
Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems
Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases
Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the
Development area
31
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance
Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets
Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained
database statistics from time to time
Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and
indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports
The summary shown in Figure 1provides general information about
Figure 1 Summary of database statistics
Author Dmitri Kouzmenkokdvib-aidcom
1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction
Development area
32
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases
Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)
As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed
What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay
Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be
stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF
Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure
Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1
Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint
We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base
bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment
bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions
bull The Oldest active3 ndash the oldestcurrently active transaction
bull The Next transaction ndash thetransaction number that will beassigned to new transaction
bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell
if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources
bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset
As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem
It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless
Do not gather statistics if you
bull Just restored your database
bull Performed backup (gbak ndashb dbgdb) without the ndashg switch
bull Recently performed a manual sweep (gfix ndashsweep)
Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)
Development area
33
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw
The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data
For example look at ADDR_ADDRESS_IDX6 First of all the index
How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)
The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most
multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments
Table informationLets take look at another sampleoutput from IBAnalyst
The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics
ly and what the table size is inmegabytes Most of these warningsare customizable
In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records
Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics
were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could
bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway
Development area
34
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20
bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column
That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics
Reports
There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded
As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size
Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-
Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144
ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3
SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance
Readers feedback
35
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Comments to ldquoTemporary tablesrdquo
One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence
Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird
This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience
Then there are situations where theoptimizer simply loses the plot
because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans
Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem
The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp
and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample
The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)
We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is
dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably
Well to be fair Firebird developersare working on both topics
Volker Rehn
volkerrehnbigpondcom
We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it
Alexey Kovyazin Chief Editor
Readers feedback
Miscellaneous
36
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Invitations SubscriptionDonations Sponsorships
This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000
Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance
Subscribe now
To receive future issuesnotifications send email to
subscribeibdevelopercom
MagazineCD
Donations
TestBed
21
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Testing the NO SAVEPOINT
Figure 6
The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2version and its own version is written to disk as the latest one
Figure 7
If we have an explicit SAVEPOINT each new record version associated with it will have acorresponding backversion in the Undo log of that savepoint
Original design of IB implement second UPDATE another way but sometime after IB6 Bor-land changed original behavior and we see what we see nowBut this theme is for anotherarticle)
Why is the delta of memory usage zero The reason is that beyond the second UPDATEno record version is created From here on the update just replaces record data on diskwith the newest one and shifts the superseded version into the Undo log
A more interesting question is why we see an increase in disk reads and writes during thetest We would have expected that the third and following UPDATEs would do essentiallyequal numbers of read and writes to write the newest versions and move the previous onesto the undo log However we are actually seeing a growing count of writes We have noanswer for it but we would be pleased to know
The following figure (figure 6) helps to illustrate the situation in the Undo log during thesequential updates When NO SAVEPOINT is enabled the only pieces we need to performare replacing the version on disk and updating the original backversion It is fast as the firstUPDATE
Explicit SAVEPOINTWhen an UPDATE statement is going to be performed within its own explicit or implicit
BEGIN END savepoint framework the engine has to store a backversion for each associ-ated record version in the Undo log
For example if we used an explicit savepoint eg SAVEPOINT Savepoint1 upon perform-ing UPDATE2 we would have the situation illustrated in figure 7
In this case the memory consumption would be expected to increase each time an UPDATEoccurs within the explicit savepoints scope
SummaryThe new transaction option NO SAVEPOINT can solve the problem of excessive resourceusage growth that can occur with sequential bulk updates It should be beneficial whenapplied appropriately Because the option can create its own problems by inhibiting theadvance of the OIT it should be used with caution of course The developer will need to takeextra care about database housekeeping particularly with respect to timely sweeping
Development area
22
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article
The sources in the bibliography arelisted in the order of their appear-ance in my mind
The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability
of all the methods taken from thearticles and of course I have testedall my own methods
Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise
I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc
And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle
The problem state-ment
What is the problem
Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common
1Everyone got used to them andfelt comfortable with them
2They are quite fast (if you usethem according to certain knownstandards)
3They use SQL which is an easycomprehensible and time-proveddata manipulation method
4They are based upon a strongmathematical theory
5They are convenient for applica-
tion development - if you developyour applications just like 20-30years ago
As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and
Author Vladimir Kotlyarevskyvladcontekru
Development area
23
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip
As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored
What do we need
The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing
RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task
The OID
All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it
First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have
any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database
OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex
Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)
Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world
And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones
Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles
The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers
ClassId
All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is
Development area
24
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain
Name Description cre-ation_date change_dateowner
Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect
links is often a very complicatedtask to put it mildly
Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later
The OBJECTS Table
Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure
So let us design this table
Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)
The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)
Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method
Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage
There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student
create domain TOID as integer not nullcreate table OBJECTS(
OID TOID primary keyClassId TOIDName varchar(32)
A simple plain dictionary can be retrieved from the OBJECTStable by the following query
select OID Name Description from OBJECTS where ClassId = LookupId
Development area
25
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description
from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)
Later I will demonstrate how to add alittle more intelligence to such simpledictionaries
Storing of more complexobjects
It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping
Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders
create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))
which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to
ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query
select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id
As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did
If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M
create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))
One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present
Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database
Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this
create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))
connected 1M with OBJECTS by OID There is also a table
create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))
which describes type metadata ndash an attribute set (their names and types) for eachobject type
All attributes for particular object where the OID is known are retrieved by the query
select attribute_id value from attributeswhere OID = oid
or with names of attributes
select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id
Development area
26
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries
select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id
Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query
Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the
database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base
The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1
Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage
Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB
It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-
utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation
To be continuedhellip
FastReport 3 - new generation of the reporting tools
Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix
reports support support most popular DB-engines
Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags
(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)
Access to DB fields styles text flow URLs Anchors
httpwwwfast-reportcom
Development area
27
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
Replicating and synchronizingInterBaseFirebirddatabases using CopyCat
AuthorJonathan Neve
jonathanmicrotecfr
wwwmicrotecfr
PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication
Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases
Data logging
Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)
Multi-node replication
Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after
In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it
replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)
When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it
Two-way replication
One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed
The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change
Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself
Primary key synchronization
One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-
Development area
28
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values
Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server
When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique
Conflict management
Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-
sion of the record this is a conflict
CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved
This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently
Difficult database structures
There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node
Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged
To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an
Development area
29
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable
PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms
1 As a set of Delphi C++ Builder components
2 As a standalone Win32 replication tool
We will now present each one of these products
1 CopyCat Delphi C++ Builder Component Set
CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions
Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server
Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table
Below is a concise guide for getting started with the CopyCat component suite
Download and install the evaluation components fromhttpwwwmicrotecfrcopycat
Prepare databases
1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components
2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)
3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo
4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base
5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo
Development area
30
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data
6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority
7) Apply all the generated SQL to all databases that should be replicated
8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)
Replicate
1 In Delphi open the ldquoReplicatorrdquo example project
2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property
3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases
4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)
5 Compile and run the example
6 Press the ldquoReplicate nowrdquo button
2 CopyTiger the CopyCat Win32standalone replication tool
For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER
Features include
bull Easy to use installer
bull Independent server Administrator tool (CTAdmin)
bull Configuration wizard for setting up links to master slave databases
bull Robust replication engine based on Microtec CopyCat
bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections
bull Simple amp intuitive control panel
bullAutomatic email notification on certain events (conflicts PK violations etc)
Visit the CopyTiger homepage to download a time-limited trial version
httpwwwmicrotecfrcopycatct
SUMMARY
In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit
1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications
2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure
CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool
You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr
As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold
Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code
ibd-852-150
Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems
Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases
Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the
Development area
31
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance
Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets
Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained
database statistics from time to time
Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and
indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports
The summary shown in Figure 1provides general information about
Figure 1 Summary of database statistics
Author Dmitri Kouzmenkokdvib-aidcom
1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction
Development area
32
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases
Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)
As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed
What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay
Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be
stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF
Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure
Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1
Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint
We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base
bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment
bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions
bull The Oldest active3 ndash the oldestcurrently active transaction
bull The Next transaction ndash thetransaction number that will beassigned to new transaction
bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell
if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources
bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset
As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem
It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless
Do not gather statistics if you
bull Just restored your database
bull Performed backup (gbak ndashb dbgdb) without the ndashg switch
bull Recently performed a manual sweep (gfix ndashsweep)
Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)
Development area
33
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw
The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data
For example look at ADDR_ADDRESS_IDX6 First of all the index
How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)
The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most
multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments
Table informationLets take look at another sampleoutput from IBAnalyst
The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics
ly and what the table size is inmegabytes Most of these warningsare customizable
In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records
Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics
were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could
bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway
Development area
34
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20
bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column
That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics
Reports
There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded
As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size
Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-
Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144
ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3
SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance
Readers feedback
35
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Comments to ldquoTemporary tablesrdquo
One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence
Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird
This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience
Then there are situations where theoptimizer simply loses the plot
because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans
Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem
The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp
and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample
The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)
We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is
dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably
Well to be fair Firebird developersare working on both topics
Volker Rehn
volkerrehnbigpondcom
We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it
Alexey Kovyazin Chief Editor
Readers feedback
Miscellaneous
36
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Invitations SubscriptionDonations Sponsorships
This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000
Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance
Subscribe now
To receive future issuesnotifications send email to
subscribeibdevelopercom
MagazineCD
Donations
Development area
22
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article
The sources in the bibliography arelisted in the order of their appear-ance in my mind
The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability
of all the methods taken from thearticles and of course I have testedall my own methods
Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise
I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc
And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle
The problem state-ment
What is the problem
Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common
1Everyone got used to them andfelt comfortable with them
2They are quite fast (if you usethem according to certain knownstandards)
3They use SQL which is an easycomprehensible and time-proveddata manipulation method
4They are based upon a strongmathematical theory
5They are convenient for applica-
tion development - if you developyour applications just like 20-30years ago
As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and
Author Vladimir Kotlyarevskyvladcontekru
Development area
23
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip
As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored
What do we need
The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing
RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task
The OID
All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it
First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have
any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database
OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex
Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)
Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world
And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones
Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles
The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers
ClassId
All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is
Development area
24
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain
Name Description cre-ation_date change_dateowner
Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect
links is often a very complicatedtask to put it mildly
Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later
The OBJECTS Table
Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure
So let us design this table
Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)
The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)
Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method
Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage
There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student
create domain TOID as integer not nullcreate table OBJECTS(
OID TOID primary keyClassId TOIDName varchar(32)
A simple plain dictionary can be retrieved from the OBJECTStable by the following query
select OID Name Description from OBJECTS where ClassId = LookupId
Development area
25
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description
from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)
Later I will demonstrate how to add alittle more intelligence to such simpledictionaries
Storing of more complexobjects
It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping
Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders
create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))
which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to
ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query
select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id
As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did
If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M
create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))
One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present
Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database
Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this
create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))
connected 1M with OBJECTS by OID There is also a table
create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))
which describes type metadata ndash an attribute set (their names and types) for eachobject type
All attributes for particular object where the OID is known are retrieved by the query
select attribute_id value from attributeswhere OID = oid
or with names of attributes
select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id
Development area
26
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries
select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id
Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query
Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the
database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base
The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1
Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage
Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB
It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-
utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation
To be continuedhellip
FastReport 3 - new generation of the reporting tools
Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix
reports support support most popular DB-engines
Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags
(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)
Access to DB fields styles text flow URLs Anchors
httpwwwfast-reportcom
Development area
27
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
Replicating and synchronizingInterBaseFirebirddatabases using CopyCat
AuthorJonathan Neve
jonathanmicrotecfr
wwwmicrotecfr
PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication
Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases
Data logging
Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)
Multi-node replication
Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after
In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it
replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)
When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it
Two-way replication
One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed
The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change
Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself
Primary key synchronization
One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-
Development area
28
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values
Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server
When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique
Conflict management
Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-
sion of the record this is a conflict
CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved
This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently
Difficult database structures
There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node
Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged
To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an
Development area
29
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable
PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms
1 As a set of Delphi C++ Builder components
2 As a standalone Win32 replication tool
We will now present each one of these products
1 CopyCat Delphi C++ Builder Component Set
CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions
Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server
Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table
Below is a concise guide for getting started with the CopyCat component suite
Download and install the evaluation components fromhttpwwwmicrotecfrcopycat
Prepare databases
1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components
2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)
3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo
4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base
5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo
Development area
30
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data
6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority
7) Apply all the generated SQL to all databases that should be replicated
8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)
Replicate
1 In Delphi open the ldquoReplicatorrdquo example project
2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property
3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases
4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)
5 Compile and run the example
6 Press the ldquoReplicate nowrdquo button
2 CopyTiger the CopyCat Win32standalone replication tool
For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER
Features include
bull Easy to use installer
bull Independent server Administrator tool (CTAdmin)
bull Configuration wizard for setting up links to master slave databases
bull Robust replication engine based on Microtec CopyCat
bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections
bull Simple amp intuitive control panel
bullAutomatic email notification on certain events (conflicts PK violations etc)
Visit the CopyTiger homepage to download a time-limited trial version
httpwwwmicrotecfrcopycatct
SUMMARY
In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit
1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications
2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure
CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool
You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr
As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold
Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code
ibd-852-150
Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems
Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases
Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the
Development area
31
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance
Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets
Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained
database statistics from time to time
Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and
indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports
The summary shown in Figure 1provides general information about
Figure 1 Summary of database statistics
Author Dmitri Kouzmenkokdvib-aidcom
1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction
Development area
32
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases
Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)
As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed
What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay
Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be
stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF
Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure
Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1
Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint
We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base
bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment
bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions
bull The Oldest active3 ndash the oldestcurrently active transaction
bull The Next transaction ndash thetransaction number that will beassigned to new transaction
bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell
if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources
bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset
As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem
It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless
Do not gather statistics if you
bull Just restored your database
bull Performed backup (gbak ndashb dbgdb) without the ndashg switch
bull Recently performed a manual sweep (gfix ndashsweep)
Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)
Development area
33
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw
The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data
For example look at ADDR_ADDRESS_IDX6 First of all the index
How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)
The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most
multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments
Table informationLets take look at another sampleoutput from IBAnalyst
The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics
ly and what the table size is inmegabytes Most of these warningsare customizable
In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records
Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics
were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could
bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway
Development area
34
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20
bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column
That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics
Reports
There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded
As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size
Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-
Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144
ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3
SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance
Readers feedback
35
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Comments to ldquoTemporary tablesrdquo
One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence
Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird
This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience
Then there are situations where theoptimizer simply loses the plot
because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans
Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem
The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp
and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample
The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)
We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is
dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably
Well to be fair Firebird developersare working on both topics
Volker Rehn
volkerrehnbigpondcom
We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it
Alexey Kovyazin Chief Editor
Readers feedback
Miscellaneous
36
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Invitations SubscriptionDonations Sponsorships
This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000
Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance
Subscribe now
To receive future issuesnotifications send email to
subscribeibdevelopercom
MagazineCD
Donations
Development area
23
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip
As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored
What do we need
The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing
RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task
The OID
All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it
First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have
any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database
OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex
Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)
Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world
And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones
Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles
The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers
ClassId
All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is
Development area
24
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain
Name Description cre-ation_date change_dateowner
Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect
links is often a very complicatedtask to put it mildly
Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later
The OBJECTS Table
Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure
So let us design this table
Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)
The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)
Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method
Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage
There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student
create domain TOID as integer not nullcreate table OBJECTS(
OID TOID primary keyClassId TOIDName varchar(32)
A simple plain dictionary can be retrieved from the OBJECTStable by the following query
select OID Name Description from OBJECTS where ClassId = LookupId
Development area
25
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description
from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)
Later I will demonstrate how to add alittle more intelligence to such simpledictionaries
Storing of more complexobjects
It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping
Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders
create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))
which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to
ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query
select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id
As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did
If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M
create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))
One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present
Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database
Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this
create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))
connected 1M with OBJECTS by OID There is also a table
create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))
which describes type metadata ndash an attribute set (their names and types) for eachobject type
All attributes for particular object where the OID is known are retrieved by the query
select attribute_id value from attributeswhere OID = oid
or with names of attributes
select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id
Development area
26
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries
select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id
Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query
Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the
database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base
The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1
Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage
Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB
It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-
utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation
To be continuedhellip
FastReport 3 - new generation of the reporting tools
Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix
reports support support most popular DB-engines
Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags
(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)
Access to DB fields styles text flow URLs Anchors
httpwwwfast-reportcom
Development area
27
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
Replicating and synchronizingInterBaseFirebirddatabases using CopyCat
AuthorJonathan Neve
jonathanmicrotecfr
wwwmicrotecfr
PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication
Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases
Data logging
Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)
Multi-node replication
Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after
In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it
replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)
When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it
Two-way replication
One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed
The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change
Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself
Primary key synchronization
One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-
Development area
28
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values
Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server
When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique
Conflict management
Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-
sion of the record this is a conflict
CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved
This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently
Difficult database structures
There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node
Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged
To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an
Development area
29
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable
PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms
1 As a set of Delphi C++ Builder components
2 As a standalone Win32 replication tool
We will now present each one of these products
1 CopyCat Delphi C++ Builder Component Set
CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions
Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server
Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table
Below is a concise guide for getting started with the CopyCat component suite
Download and install the evaluation components fromhttpwwwmicrotecfrcopycat
Prepare databases
1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components
2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)
3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo
4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base
5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo
Development area
30
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data
6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority
7) Apply all the generated SQL to all databases that should be replicated
8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)
Replicate
1 In Delphi open the ldquoReplicatorrdquo example project
2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property
3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases
4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)
5 Compile and run the example
6 Press the ldquoReplicate nowrdquo button
2 CopyTiger the CopyCat Win32standalone replication tool
For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER
Features include
bull Easy to use installer
bull Independent server Administrator tool (CTAdmin)
bull Configuration wizard for setting up links to master slave databases
bull Robust replication engine based on Microtec CopyCat
bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections
bull Simple amp intuitive control panel
bullAutomatic email notification on certain events (conflicts PK violations etc)
Visit the CopyTiger homepage to download a time-limited trial version
httpwwwmicrotecfrcopycatct
SUMMARY
In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit
1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications
2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure
CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool
You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr
As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold
Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code
ibd-852-150
Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems
Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases
Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the
Development area
31
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance
Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets
Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained
database statistics from time to time
Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and
indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports
The summary shown in Figure 1provides general information about
Figure 1 Summary of database statistics
Author Dmitri Kouzmenkokdvib-aidcom
1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction
Development area
32
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases
Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)
As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed
What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay
Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be
stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF
Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure
Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1
Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint
We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base
bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment
bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions
bull The Oldest active3 ndash the oldestcurrently active transaction
bull The Next transaction ndash thetransaction number that will beassigned to new transaction
bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell
if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources
bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset
As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem
It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless
Do not gather statistics if you
bull Just restored your database
bull Performed backup (gbak ndashb dbgdb) without the ndashg switch
bull Recently performed a manual sweep (gfix ndashsweep)
Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)
Development area
33
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw
The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data
For example look at ADDR_ADDRESS_IDX6 First of all the index
How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)
The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most
multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments
Table informationLets take look at another sampleoutput from IBAnalyst
The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics
ly and what the table size is inmegabytes Most of these warningsare customizable
In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records
Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics
were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could
bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway
Development area
34
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20
bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column
That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics
Reports
There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded
As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size
Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-
Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144
ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3
SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance
Readers feedback
35
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Comments to ldquoTemporary tablesrdquo
One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence
Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird
This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience
Then there are situations where theoptimizer simply loses the plot
because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans
Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem
The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp
and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample
The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)
We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is
dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably
Well to be fair Firebird developersare working on both topics
Volker Rehn
volkerrehnbigpondcom
We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it
Alexey Kovyazin Chief Editor
Readers feedback
Miscellaneous
36
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Invitations SubscriptionDonations Sponsorships
This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000
Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance
Subscribe now
To receive future issuesnotifications send email to
subscribeibdevelopercom
MagazineCD
Donations
Development area
24
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain
Name Description cre-ation_date change_dateowner
Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect
links is often a very complicatedtask to put it mildly
Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later
The OBJECTS Table
Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure
So let us design this table
Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)
The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)
Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method
Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage
There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student
create domain TOID as integer not nullcreate table OBJECTS(
OID TOID primary keyClassId TOIDName varchar(32)
A simple plain dictionary can be retrieved from the OBJECTStable by the following query
select OID Name Description from OBJECTS where ClassId = LookupId
Development area
25
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description
from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)
Later I will demonstrate how to add alittle more intelligence to such simpledictionaries
Storing of more complexobjects
It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping
Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders
create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))
which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to
ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query
select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id
As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did
If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M
create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))
One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present
Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database
Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this
create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))
connected 1M with OBJECTS by OID There is also a table
create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))
which describes type metadata ndash an attribute set (their names and types) for eachobject type
All attributes for particular object where the OID is known are retrieved by the query
select attribute_id value from attributeswhere OID = oid
or with names of attributes
select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id
Development area
26
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries
select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id
Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query
Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the
database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base
The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1
Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage
Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB
It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-
utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation
To be continuedhellip
FastReport 3 - new generation of the reporting tools
Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix
reports support support most popular DB-engines
Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags
(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)
Access to DB fields styles text flow URLs Anchors
httpwwwfast-reportcom
Development area
27
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
Replicating and synchronizingInterBaseFirebirddatabases using CopyCat
AuthorJonathan Neve
jonathanmicrotecfr
wwwmicrotecfr
PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication
Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases
Data logging
Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)
Multi-node replication
Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after
In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it
replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)
When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it
Two-way replication
One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed
The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change
Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself
Primary key synchronization
One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-
Development area
28
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values
Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server
When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique
Conflict management
Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-
sion of the record this is a conflict
CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved
This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently
Difficult database structures
There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node
Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged
To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an
Development area
29
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable
PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms
1 As a set of Delphi C++ Builder components
2 As a standalone Win32 replication tool
We will now present each one of these products
1 CopyCat Delphi C++ Builder Component Set
CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions
Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server
Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table
Below is a concise guide for getting started with the CopyCat component suite
Download and install the evaluation components fromhttpwwwmicrotecfrcopycat
Prepare databases
1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components
2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)
3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo
4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base
5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo
Development area
30
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data
6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority
7) Apply all the generated SQL to all databases that should be replicated
8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)
Replicate
1 In Delphi open the ldquoReplicatorrdquo example project
2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property
3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases
4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)
5 Compile and run the example
6 Press the ldquoReplicate nowrdquo button
2 CopyTiger the CopyCat Win32standalone replication tool
For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER
Features include
bull Easy to use installer
bull Independent server Administrator tool (CTAdmin)
bull Configuration wizard for setting up links to master slave databases
bull Robust replication engine based on Microtec CopyCat
bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections
bull Simple amp intuitive control panel
bullAutomatic email notification on certain events (conflicts PK violations etc)
Visit the CopyTiger homepage to download a time-limited trial version
httpwwwmicrotecfrcopycatct
SUMMARY
In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit
1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications
2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure
CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool
You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr
As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold
Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code
ibd-852-150
Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems
Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases
Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the
Development area
31
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance
Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets
Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained
database statistics from time to time
Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and
indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports
The summary shown in Figure 1provides general information about
Figure 1 Summary of database statistics
Author Dmitri Kouzmenkokdvib-aidcom
1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction
Development area
32
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases
Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)
As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed
What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay
Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be
stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF
Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure
Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1
Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint
We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base
bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment
bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions
bull The Oldest active3 ndash the oldestcurrently active transaction
bull The Next transaction ndash thetransaction number that will beassigned to new transaction
bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell
if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources
bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset
As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem
It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless
Do not gather statistics if you
bull Just restored your database
bull Performed backup (gbak ndashb dbgdb) without the ndashg switch
bull Recently performed a manual sweep (gfix ndashsweep)
Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)
Development area
33
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw
The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data
For example look at ADDR_ADDRESS_IDX6 First of all the index
How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)
The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most
multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments
Table informationLets take look at another sampleoutput from IBAnalyst
The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics
ly and what the table size is inmegabytes Most of these warningsare customizable
In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records
Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics
were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could
bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway
Development area
34
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20
bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column
That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics
Reports
There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded
As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size
Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-
Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144
ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3
SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance
Readers feedback
35
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Comments to ldquoTemporary tablesrdquo
One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence
Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird
This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience
Then there are situations where theoptimizer simply loses the plot
because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans
Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem
The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp
and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample
The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)
We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is
dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably
Well to be fair Firebird developersare working on both topics
Volker Rehn
volkerrehnbigpondcom
We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it
Alexey Kovyazin Chief Editor
Readers feedback
Miscellaneous
36
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Invitations SubscriptionDonations Sponsorships
This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000
Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance
Subscribe now
To receive future issuesnotifications send email to
subscribeibdevelopercom
MagazineCD
Donations
Development area
25
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description
from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)
Later I will demonstrate how to add alittle more intelligence to such simpledictionaries
Storing of more complexobjects
It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping
Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders
create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))
which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to
ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query
select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id
As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did
If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M
create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))
One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present
Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database
Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this
create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))
connected 1M with OBJECTS by OID There is also a table
create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))
which describes type metadata ndash an attribute set (their names and types) for eachobject type
All attributes for particular object where the OID is known are retrieved by the query
select attribute_id value from attributeswhere OID = oid
or with names of attributes
select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id
Development area
26
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries
select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id
Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query
Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the
database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base
The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1
Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage
Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB
It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-
utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation
To be continuedhellip
FastReport 3 - new generation of the reporting tools
Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix
reports support support most popular DB-engines
Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags
(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)
Access to DB fields styles text flow URLs Anchors
httpwwwfast-reportcom
Development area
27
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
Replicating and synchronizingInterBaseFirebirddatabases using CopyCat
AuthorJonathan Neve
jonathanmicrotecfr
wwwmicrotecfr
PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication
Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases
Data logging
Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)
Multi-node replication
Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after
In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it
replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)
When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it
Two-way replication
One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed
The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change
Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself
Primary key synchronization
One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-
Development area
28
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values
Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server
When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique
Conflict management
Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-
sion of the record this is a conflict
CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved
This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently
Difficult database structures
There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node
Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged
To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an
Development area
29
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable
PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms
1 As a set of Delphi C++ Builder components
2 As a standalone Win32 replication tool
We will now present each one of these products
1 CopyCat Delphi C++ Builder Component Set
CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions
Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server
Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table
Below is a concise guide for getting started with the CopyCat component suite
Download and install the evaluation components fromhttpwwwmicrotecfrcopycat
Prepare databases
1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components
2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)
3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo
4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base
5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo
Development area
30
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data
6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority
7) Apply all the generated SQL to all databases that should be replicated
8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)
Replicate
1 In Delphi open the ldquoReplicatorrdquo example project
2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property
3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases
4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)
5 Compile and run the example
6 Press the ldquoReplicate nowrdquo button
2 CopyTiger the CopyCat Win32standalone replication tool
For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER
Features include
bull Easy to use installer
bull Independent server Administrator tool (CTAdmin)
bull Configuration wizard for setting up links to master slave databases
bull Robust replication engine based on Microtec CopyCat
bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections
bull Simple amp intuitive control panel
bullAutomatic email notification on certain events (conflicts PK violations etc)
Visit the CopyTiger homepage to download a time-limited trial version
httpwwwmicrotecfrcopycatct
SUMMARY
In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit
1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications
2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure
CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool
You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr
As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold
Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code
ibd-852-150
Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems
Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases
Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the
Development area
31
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance
Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets
Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained
database statistics from time to time
Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and
indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports
The summary shown in Figure 1provides general information about
Figure 1 Summary of database statistics
Author Dmitri Kouzmenkokdvib-aidcom
1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction
Development area
32
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases
Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)
As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed
What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay
Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be
stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF
Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure
Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1
Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint
We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base
bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment
bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions
bull The Oldest active3 ndash the oldestcurrently active transaction
bull The Next transaction ndash thetransaction number that will beassigned to new transaction
bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell
if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources
bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset
As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem
It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless
Do not gather statistics if you
bull Just restored your database
bull Performed backup (gbak ndashb dbgdb) without the ndashg switch
bull Recently performed a manual sweep (gfix ndashsweep)
Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)
Development area
33
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw
The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data
For example look at ADDR_ADDRESS_IDX6 First of all the index
How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)
The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most
multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments
Table informationLets take look at another sampleoutput from IBAnalyst
The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics
ly and what the table size is inmegabytes Most of these warningsare customizable
In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records
Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics
were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could
bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway
Development area
34
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20
bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column
That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics
Reports
There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded
As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size
Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-
Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144
ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3
SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance
Readers feedback
35
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Comments to ldquoTemporary tablesrdquo
One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence
Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird
This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience
Then there are situations where theoptimizer simply loses the plot
because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans
Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem
The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp
and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample
The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)
We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is
dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably
Well to be fair Firebird developersare working on both topics
Volker Rehn
volkerrehnbigpondcom
We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it
Alexey Kovyazin Chief Editor
Readers feedback
Miscellaneous
36
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Invitations SubscriptionDonations Sponsorships
This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000
Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance
Subscribe now
To receive future issuesnotifications send email to
subscribeibdevelopercom
MagazineCD
Donations
Development area
26
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
OOD and RDBMS Part 1
In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries
select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id
Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query
Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the
database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base
The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1
Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage
Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB
It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-
utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation
To be continuedhellip
FastReport 3 - new generation of the reporting tools
Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix
reports support support most popular DB-engines
Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags
(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)
Access to DB fields styles text flow URLs Anchors
httpwwwfast-reportcom
Development area
27
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
Replicating and synchronizingInterBaseFirebirddatabases using CopyCat
AuthorJonathan Neve
jonathanmicrotecfr
wwwmicrotecfr
PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication
Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases
Data logging
Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)
Multi-node replication
Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after
In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it
replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)
When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it
Two-way replication
One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed
The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change
Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself
Primary key synchronization
One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-
Development area
28
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values
Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server
When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique
Conflict management
Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-
sion of the record this is a conflict
CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved
This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently
Difficult database structures
There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node
Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged
To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an
Development area
29
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable
PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms
1 As a set of Delphi C++ Builder components
2 As a standalone Win32 replication tool
We will now present each one of these products
1 CopyCat Delphi C++ Builder Component Set
CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions
Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server
Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table
Below is a concise guide for getting started with the CopyCat component suite
Download and install the evaluation components fromhttpwwwmicrotecfrcopycat
Prepare databases
1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components
2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)
3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo
4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base
5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo
Development area
30
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data
6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority
7) Apply all the generated SQL to all databases that should be replicated
8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)
Replicate
1 In Delphi open the ldquoReplicatorrdquo example project
2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property
3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases
4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)
5 Compile and run the example
6 Press the ldquoReplicate nowrdquo button
2 CopyTiger the CopyCat Win32standalone replication tool
For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER
Features include
bull Easy to use installer
bull Independent server Administrator tool (CTAdmin)
bull Configuration wizard for setting up links to master slave databases
bull Robust replication engine based on Microtec CopyCat
bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections
bull Simple amp intuitive control panel
bullAutomatic email notification on certain events (conflicts PK violations etc)
Visit the CopyTiger homepage to download a time-limited trial version
httpwwwmicrotecfrcopycatct
SUMMARY
In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit
1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications
2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure
CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool
You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr
As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold
Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code
ibd-852-150
Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems
Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases
Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the
Development area
31
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance
Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets
Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained
database statistics from time to time
Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and
indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports
The summary shown in Figure 1provides general information about
Figure 1 Summary of database statistics
Author Dmitri Kouzmenkokdvib-aidcom
1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction
Development area
32
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases
Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)
As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed
What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay
Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be
stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF
Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure
Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1
Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint
We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base
bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment
bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions
bull The Oldest active3 ndash the oldestcurrently active transaction
bull The Next transaction ndash thetransaction number that will beassigned to new transaction
bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell
if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources
bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset
As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem
It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless
Do not gather statistics if you
bull Just restored your database
bull Performed backup (gbak ndashb dbgdb) without the ndashg switch
bull Recently performed a manual sweep (gfix ndashsweep)
Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)
Development area
33
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw
The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data
For example look at ADDR_ADDRESS_IDX6 First of all the index
How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)
The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most
multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments
Table informationLets take look at another sampleoutput from IBAnalyst
The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics
ly and what the table size is inmegabytes Most of these warningsare customizable
In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records
Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics
were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could
bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway
Development area
34
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20
bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column
That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics
Reports
There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded
As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size
Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-
Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144
ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3
SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance
Readers feedback
35
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Comments to ldquoTemporary tablesrdquo
One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence
Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird
This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience
Then there are situations where theoptimizer simply loses the plot
because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans
Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem
The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp
and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample
The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)
We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is
dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably
Well to be fair Firebird developersare working on both topics
Volker Rehn
volkerrehnbigpondcom
We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it
Alexey Kovyazin Chief Editor
Readers feedback
Miscellaneous
36
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Invitations SubscriptionDonations Sponsorships
This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000
Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance
Subscribe now
To receive future issuesnotifications send email to
subscribeibdevelopercom
MagazineCD
Donations
Development area
27
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
Replicating and synchronizingInterBaseFirebirddatabases using CopyCat
AuthorJonathan Neve
jonathanmicrotecfr
wwwmicrotecfr
PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication
Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases
Data logging
Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)
Multi-node replication
Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after
In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it
replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)
When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it
Two-way replication
One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed
The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change
Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself
Primary key synchronization
One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-
Development area
28
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values
Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server
When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique
Conflict management
Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-
sion of the record this is a conflict
CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved
This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently
Difficult database structures
There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node
Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged
To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an
Development area
29
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable
PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms
1 As a set of Delphi C++ Builder components
2 As a standalone Win32 replication tool
We will now present each one of these products
1 CopyCat Delphi C++ Builder Component Set
CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions
Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server
Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table
Below is a concise guide for getting started with the CopyCat component suite
Download and install the evaluation components fromhttpwwwmicrotecfrcopycat
Prepare databases
1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components
2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)
3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo
4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base
5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo
Development area
30
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data
6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority
7) Apply all the generated SQL to all databases that should be replicated
8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)
Replicate
1 In Delphi open the ldquoReplicatorrdquo example project
2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property
3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases
4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)
5 Compile and run the example
6 Press the ldquoReplicate nowrdquo button
2 CopyTiger the CopyCat Win32standalone replication tool
For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER
Features include
bull Easy to use installer
bull Independent server Administrator tool (CTAdmin)
bull Configuration wizard for setting up links to master slave databases
bull Robust replication engine based on Microtec CopyCat
bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections
bull Simple amp intuitive control panel
bullAutomatic email notification on certain events (conflicts PK violations etc)
Visit the CopyTiger homepage to download a time-limited trial version
httpwwwmicrotecfrcopycatct
SUMMARY
In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit
1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications
2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure
CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool
You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr
As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold
Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code
ibd-852-150
Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems
Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases
Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the
Development area
31
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance
Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets
Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained
database statistics from time to time
Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and
indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports
The summary shown in Figure 1provides general information about
Figure 1 Summary of database statistics
Author Dmitri Kouzmenkokdvib-aidcom
1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction
Development area
32
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases
Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)
As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed
What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay
Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be
stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF
Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure
Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1
Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint
We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base
bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment
bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions
bull The Oldest active3 ndash the oldestcurrently active transaction
bull The Next transaction ndash thetransaction number that will beassigned to new transaction
bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell
if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources
bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset
As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem
It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless
Do not gather statistics if you
bull Just restored your database
bull Performed backup (gbak ndashb dbgdb) without the ndashg switch
bull Recently performed a manual sweep (gfix ndashsweep)
Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)
Development area
33
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw
The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data
For example look at ADDR_ADDRESS_IDX6 First of all the index
How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)
The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most
multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments
Table informationLets take look at another sampleoutput from IBAnalyst
The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics
ly and what the table size is inmegabytes Most of these warningsare customizable
In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records
Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics
were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could
bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway
Development area
34
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20
bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column
That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics
Reports
There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded
As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size
Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-
Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144
ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3
SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance
Readers feedback
35
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Comments to ldquoTemporary tablesrdquo
One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence
Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird
This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience
Then there are situations where theoptimizer simply loses the plot
because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans
Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem
The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp
and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample
The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)
We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is
dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably
Well to be fair Firebird developersare working on both topics
Volker Rehn
volkerrehnbigpondcom
We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it
Alexey Kovyazin Chief Editor
Readers feedback
Miscellaneous
36
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Invitations SubscriptionDonations Sponsorships
This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000
Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance
Subscribe now
To receive future issuesnotifications send email to
subscribeibdevelopercom
MagazineCD
Donations
Development area
28
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values
Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server
When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique
Conflict management
Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-
sion of the record this is a conflict
CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved
This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently
Difficult database structures
There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node
Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged
To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an
Development area
29
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable
PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms
1 As a set of Delphi C++ Builder components
2 As a standalone Win32 replication tool
We will now present each one of these products
1 CopyCat Delphi C++ Builder Component Set
CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions
Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server
Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table
Below is a concise guide for getting started with the CopyCat component suite
Download and install the evaluation components fromhttpwwwmicrotecfrcopycat
Prepare databases
1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components
2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)
3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo
4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base
5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo
Development area
30
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data
6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority
7) Apply all the generated SQL to all databases that should be replicated
8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)
Replicate
1 In Delphi open the ldquoReplicatorrdquo example project
2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property
3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases
4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)
5 Compile and run the example
6 Press the ldquoReplicate nowrdquo button
2 CopyTiger the CopyCat Win32standalone replication tool
For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER
Features include
bull Easy to use installer
bull Independent server Administrator tool (CTAdmin)
bull Configuration wizard for setting up links to master slave databases
bull Robust replication engine based on Microtec CopyCat
bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections
bull Simple amp intuitive control panel
bullAutomatic email notification on certain events (conflicts PK violations etc)
Visit the CopyTiger homepage to download a time-limited trial version
httpwwwmicrotecfrcopycatct
SUMMARY
In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit
1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications
2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure
CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool
You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr
As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold
Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code
ibd-852-150
Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems
Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases
Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the
Development area
31
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance
Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets
Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained
database statistics from time to time
Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and
indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports
The summary shown in Figure 1provides general information about
Figure 1 Summary of database statistics
Author Dmitri Kouzmenkokdvib-aidcom
1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction
Development area
32
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases
Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)
As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed
What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay
Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be
stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF
Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure
Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1
Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint
We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base
bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment
bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions
bull The Oldest active3 ndash the oldestcurrently active transaction
bull The Next transaction ndash thetransaction number that will beassigned to new transaction
bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell
if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources
bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset
As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem
It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless
Do not gather statistics if you
bull Just restored your database
bull Performed backup (gbak ndashb dbgdb) without the ndashg switch
bull Recently performed a manual sweep (gfix ndashsweep)
Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)
Development area
33
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw
The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data
For example look at ADDR_ADDRESS_IDX6 First of all the index
How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)
The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most
multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments
Table informationLets take look at another sampleoutput from IBAnalyst
The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics
ly and what the table size is inmegabytes Most of these warningsare customizable
In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records
Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics
were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could
bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway
Development area
34
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20
bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column
That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics
Reports
There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded
As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size
Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-
Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144
ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3
SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance
Readers feedback
35
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Comments to ldquoTemporary tablesrdquo
One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence
Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird
This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience
Then there are situations where theoptimizer simply loses the plot
because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans
Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem
The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp
and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample
The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)
We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is
dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably
Well to be fair Firebird developersare working on both topics
Volker Rehn
volkerrehnbigpondcom
We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it
Alexey Kovyazin Chief Editor
Readers feedback
Miscellaneous
36
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Invitations SubscriptionDonations Sponsorships
This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000
Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance
Subscribe now
To receive future issuesnotifications send email to
subscribeibdevelopercom
MagazineCD
Donations
Development area
29
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable
PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms
1 As a set of Delphi C++ Builder components
2 As a standalone Win32 replication tool
We will now present each one of these products
1 CopyCat Delphi C++ Builder Component Set
CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions
Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server
Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table
Below is a concise guide for getting started with the CopyCat component suite
Download and install the evaluation components fromhttpwwwmicrotecfrcopycat
Prepare databases
1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components
2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)
3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo
4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base
5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo
Development area
30
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data
6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority
7) Apply all the generated SQL to all databases that should be replicated
8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)
Replicate
1 In Delphi open the ldquoReplicatorrdquo example project
2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property
3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases
4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)
5 Compile and run the example
6 Press the ldquoReplicate nowrdquo button
2 CopyTiger the CopyCat Win32standalone replication tool
For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER
Features include
bull Easy to use installer
bull Independent server Administrator tool (CTAdmin)
bull Configuration wizard for setting up links to master slave databases
bull Robust replication engine based on Microtec CopyCat
bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections
bull Simple amp intuitive control panel
bullAutomatic email notification on certain events (conflicts PK violations etc)
Visit the CopyTiger homepage to download a time-limited trial version
httpwwwmicrotecfrcopycatct
SUMMARY
In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit
1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications
2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure
CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool
You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr
As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold
Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code
ibd-852-150
Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems
Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases
Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the
Development area
31
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance
Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets
Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained
database statistics from time to time
Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and
indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports
The summary shown in Figure 1provides general information about
Figure 1 Summary of database statistics
Author Dmitri Kouzmenkokdvib-aidcom
1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction
Development area
32
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases
Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)
As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed
What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay
Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be
stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF
Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure
Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1
Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint
We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base
bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment
bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions
bull The Oldest active3 ndash the oldestcurrently active transaction
bull The Next transaction ndash thetransaction number that will beassigned to new transaction
bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell
if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources
bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset
As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem
It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless
Do not gather statistics if you
bull Just restored your database
bull Performed backup (gbak ndashb dbgdb) without the ndashg switch
bull Recently performed a manual sweep (gfix ndashsweep)
Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)
Development area
33
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw
The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data
For example look at ADDR_ADDRESS_IDX6 First of all the index
How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)
The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most
multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments
Table informationLets take look at another sampleoutput from IBAnalyst
The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics
ly and what the table size is inmegabytes Most of these warningsare customizable
In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records
Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics
were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could
bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway
Development area
34
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20
bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column
That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics
Reports
There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded
As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size
Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-
Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144
ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3
SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance
Readers feedback
35
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Comments to ldquoTemporary tablesrdquo
One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence
Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird
This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience
Then there are situations where theoptimizer simply loses the plot
because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans
Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem
The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp
and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample
The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)
We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is
dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably
Well to be fair Firebird developersare working on both topics
Volker Rehn
volkerrehnbigpondcom
We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it
Alexey Kovyazin Chief Editor
Readers feedback
Miscellaneous
36
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Invitations SubscriptionDonations Sponsorships
This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000
Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance
Subscribe now
To receive future issuesnotifications send email to
subscribeibdevelopercom
MagazineCD
Donations
Development area
30
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Replicating and synchronizing
columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data
6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority
7) Apply all the generated SQL to all databases that should be replicated
8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)
Replicate
1 In Delphi open the ldquoReplicatorrdquo example project
2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property
3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases
4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)
5 Compile and run the example
6 Press the ldquoReplicate nowrdquo button
2 CopyTiger the CopyCat Win32standalone replication tool
For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER
Features include
bull Easy to use installer
bull Independent server Administrator tool (CTAdmin)
bull Configuration wizard for setting up links to master slave databases
bull Robust replication engine based on Microtec CopyCat
bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections
bull Simple amp intuitive control panel
bullAutomatic email notification on certain events (conflicts PK violations etc)
Visit the CopyTiger homepage to download a time-limited trial version
httpwwwmicrotecfrcopycatct
SUMMARY
In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit
1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications
2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure
CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool
You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr
As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold
Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code
ibd-852-150
Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems
Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases
Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the
Development area
31
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance
Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets
Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained
database statistics from time to time
Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and
indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports
The summary shown in Figure 1provides general information about
Figure 1 Summary of database statistics
Author Dmitri Kouzmenkokdvib-aidcom
1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction
Development area
32
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases
Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)
As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed
What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay
Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be
stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF
Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure
Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1
Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint
We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base
bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment
bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions
bull The Oldest active3 ndash the oldestcurrently active transaction
bull The Next transaction ndash thetransaction number that will beassigned to new transaction
bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell
if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources
bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset
As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem
It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless
Do not gather statistics if you
bull Just restored your database
bull Performed backup (gbak ndashb dbgdb) without the ndashg switch
bull Recently performed a manual sweep (gfix ndashsweep)
Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)
Development area
33
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw
The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data
For example look at ADDR_ADDRESS_IDX6 First of all the index
How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)
The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most
multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments
Table informationLets take look at another sampleoutput from IBAnalyst
The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics
ly and what the table size is inmegabytes Most of these warningsare customizable
In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records
Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics
were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could
bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway
Development area
34
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20
bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column
That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics
Reports
There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded
As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size
Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-
Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144
ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3
SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance
Readers feedback
35
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Comments to ldquoTemporary tablesrdquo
One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence
Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird
This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience
Then there are situations where theoptimizer simply loses the plot
because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans
Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem
The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp
and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample
The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)
We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is
dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably
Well to be fair Firebird developersare working on both topics
Volker Rehn
volkerrehnbigpondcom
We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it
Alexey Kovyazin Chief Editor
Readers feedback
Miscellaneous
36
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Invitations SubscriptionDonations Sponsorships
This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000
Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance
Subscribe now
To receive future issuesnotifications send email to
subscribeibdevelopercom
MagazineCD
Donations
Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems
Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases
Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the
Development area
31
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance
Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets
Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained
database statistics from time to time
Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and
indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports
The summary shown in Figure 1provides general information about
Figure 1 Summary of database statistics
Author Dmitri Kouzmenkokdvib-aidcom
1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction
Development area
32
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases
Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)
As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed
What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay
Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be
stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF
Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure
Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1
Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint
We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base
bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment
bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions
bull The Oldest active3 ndash the oldestcurrently active transaction
bull The Next transaction ndash thetransaction number that will beassigned to new transaction
bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell
if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources
bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset
As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem
It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless
Do not gather statistics if you
bull Just restored your database
bull Performed backup (gbak ndashb dbgdb) without the ndashg switch
bull Recently performed a manual sweep (gfix ndashsweep)
Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)
Development area
33
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw
The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data
For example look at ADDR_ADDRESS_IDX6 First of all the index
How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)
The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most
multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments
Table informationLets take look at another sampleoutput from IBAnalyst
The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics
ly and what the table size is inmegabytes Most of these warningsare customizable
In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records
Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics
were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could
bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway
Development area
34
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20
bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column
That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics
Reports
There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded
As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size
Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-
Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144
ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3
SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance
Readers feedback
35
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Comments to ldquoTemporary tablesrdquo
One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence
Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird
This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience
Then there are situations where theoptimizer simply loses the plot
because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans
Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem
The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp
and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample
The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)
We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is
dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably
Well to be fair Firebird developersare working on both topics
Volker Rehn
volkerrehnbigpondcom
We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it
Alexey Kovyazin Chief Editor
Readers feedback
Miscellaneous
36
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Invitations SubscriptionDonations Sponsorships
This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000
Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance
Subscribe now
To receive future issuesnotifications send email to
subscribeibdevelopercom
MagazineCD
Donations
1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction
Development area
32
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases
Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)
As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed
What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay
Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be
stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF
Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure
Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1
Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint
We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base
bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment
bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions
bull The Oldest active3 ndash the oldestcurrently active transaction
bull The Next transaction ndash thetransaction number that will beassigned to new transaction
bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell
if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources
bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset
As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem
It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless
Do not gather statistics if you
bull Just restored your database
bull Performed backup (gbak ndashb dbgdb) without the ndashg switch
bull Recently performed a manual sweep (gfix ndashsweep)
Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)
Development area
33
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw
The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data
For example look at ADDR_ADDRESS_IDX6 First of all the index
How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)
The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most
multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments
Table informationLets take look at another sampleoutput from IBAnalyst
The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics
ly and what the table size is inmegabytes Most of these warningsare customizable
In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records
Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics
were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could
bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway
Development area
34
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20
bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column
That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics
Reports
There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded
As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size
Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-
Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144
ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3
SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance
Readers feedback
35
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Comments to ldquoTemporary tablesrdquo
One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence
Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird
This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience
Then there are situations where theoptimizer simply loses the plot
because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans
Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem
The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp
and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample
The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)
We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is
dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably
Well to be fair Firebird developersare working on both topics
Volker Rehn
volkerrehnbigpondcom
We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it
Alexey Kovyazin Chief Editor
Readers feedback
Miscellaneous
36
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Invitations SubscriptionDonations Sponsorships
This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000
Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance
Subscribe now
To receive future issuesnotifications send email to
subscribeibdevelopercom
MagazineCD
Donations
Development area
33
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw
The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data
For example look at ADDR_ADDRESS_IDX6 First of all the index
How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)
The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most
multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments
Table informationLets take look at another sampleoutput from IBAnalyst
The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics
ly and what the table size is inmegabytes Most of these warningsare customizable
In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records
Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics
were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could
bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway
Development area
34
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20
bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column
That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics
Reports
There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded
As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size
Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-
Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144
ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3
SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance
Readers feedback
35
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Comments to ldquoTemporary tablesrdquo
One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence
Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird
This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience
Then there are situations where theoptimizer simply loses the plot
because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans
Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem
The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp
and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample
The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)
We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is
dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably
Well to be fair Firebird developersare working on both topics
Volker Rehn
volkerrehnbigpondcom
We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it
Alexey Kovyazin Chief Editor
Readers feedback
Miscellaneous
36
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Invitations SubscriptionDonations Sponsorships
This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000
Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance
Subscribe now
To receive future issuesnotifications send email to
subscribeibdevelopercom
MagazineCD
Donations
Development area
34
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Using IBAnalyst
bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20
bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column
That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics
Reports
There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded
As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size
Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-
Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144
ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3
SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance
Readers feedback
35
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Comments to ldquoTemporary tablesrdquo
One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence
Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird
This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience
Then there are situations where theoptimizer simply loses the plot
because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans
Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem
The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp
and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample
The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)
We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is
dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably
Well to be fair Firebird developersare working on both topics
Volker Rehn
volkerrehnbigpondcom
We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it
Alexey Kovyazin Chief Editor
Readers feedback
Miscellaneous
36
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Invitations SubscriptionDonations Sponsorships
This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000
Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance
Subscribe now
To receive future issuesnotifications send email to
subscribeibdevelopercom
MagazineCD
Donations
Readers feedback
35
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Comments to ldquoTemporary tablesrdquo
One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence
Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird
This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience
Then there are situations where theoptimizer simply loses the plot
because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans
Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem
The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp
and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample
The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)
We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is
dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably
Well to be fair Firebird developersare working on both topics
Volker Rehn
volkerrehnbigpondcom
We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it
Alexey Kovyazin Chief Editor
Readers feedback
Miscellaneous
36
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Invitations SubscriptionDonations Sponsorships
This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000
Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance
Subscribe now
To receive future issuesnotifications send email to
subscribeibdevelopercom
MagazineCD
Donations
Miscellaneous
36
The InterBase and Firebird Developer Magazine 2005 ISSUE 2
wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved
Invitations SubscriptionDonations Sponsorships
This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000
Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance
Subscribe now
To receive future issuesnotifications send email to
subscribeibdevelopercom
MagazineCD
Donations