Lookups or “ the Good, the Bad and the Ugly ” John S. Lemon

Indexes

Lookupsor

“ the Good, the Bad and

the Ugly ”

John S. Lemon

Indexes

What do these have in common ?

One person knows – rest of you will have to wait !

Indexes

What I hope to cover Based on the Aberdeen Maternity and

Neo-natal Data Bank ( AMNDB ) History of lookups –

Dummy case SQL tabfiles Second data base Second data base + secondary indexes +

lookup command

Indexes

What is a “lookup” Converting text -> numeric code Or numeric code -> text Data entry staff enter

‘uterine adhesions’ Converted to ( and stored as )

621.5 Why ?

Space

Indexes

Why use a ‘lookup’ Alternatives are storing full text ‘uterine adhesions’

- stored as 19 bytes / characters This is one of the shorter ones !!!

Problem of different ‘spelling’ Uterine Adhesions UTERINE ADHESIONS etc.

Which to use when searching

Indexes

Why use a ‘lookup’ Store ‘uterine adhesions’ as

621.5

- stored as 8 bytes less if using integers or fixed length string codes ‘A6215’

- stored as 7 bytes

Indexes

Why use a ‘lookup’ Value labels - not practical

Size limit on label length Record specific Problems managing an increasing list Ordering / Sorting Large numbers of codes:text pairs

Occupations 13872Drugs 5357Operations 2754Diseases / Illnesses / Complications 4404

Indexes

Dummy case Only option in version 2 Use a CASEID which was unique

In AMNDB used –999999 This case only contained data for the

‘lookup’ record types It worked but ….

Indexes

Dummy case - problems Specific to that data base

Not shared – unless SIR FILE DUMP - from ‘active’ READ INPUT DATA - to ‘research’

Have to repeat for all data bases that need lookups

Indexes

Dummy case - problems Can only go ‘one way’ unless duplicate

data with different Key Fields

RECORD SCHEMA 99, ICD2NAMESORT IDS ICDCODE

RECORD SCHEMA 98, NAME2ICDSORT IDS ICDNAME

Huge problems of maintenance

Indexes

Dummy case - problems MAX KEY SIZE

May have seen in LIST STATS What is it ?

Need to have quick look at the ‘structure’ of a SiR record Simplified not definitive view If you want a complete definition –

ask Tony !!

Indexes

SiR record structure Essentially two components

Key - fixed length - same for ALL records May not be ‘filled’ for all records

Data - variable length

1

2

3

CASEID

CASEID

CASEID

Max Key Size

Max Key Size

Max Key Size

Data

Data

Data

Actual Size

Actual Size

Indexes

Dummy case - problems Without dummy case

Max Key Size - approx 20 bytes long Text field – 50 characters long

Max Key Size - approx 60 bytes long Extra 40 bytes For EVERY record 1.5 million * 40 bytes = approx 60 Mbytes Trivial today but in 1988 ……………

Indexes

Dummy Case – solutions (sic) How to resolve size problems

Reduced length of text field – 30 chars One way only Only in one data base – data entry Used ‘marvellous feature’ of sequential

data base access SIR3 file held on tape(s) Only PROCESS CASES ALL No CASE IS

Indexes

Lookups use in AMNDB Two main programmes / suites of

programmes Batch run to convert text -> code Interactive programme for each record that

requires lookups Need both for different reasons

Indexes

Lookups use in AMNDB Batch run to convert text -> code Does all record types in data base that

has data for looking up Only converts records where text code

is already available Marks successful lookups Leave unsuccessful ones for interactive

Indexes

Lookups use in AMNDB One interactive programme per record

with data for looking up Functions in same way as batch except

If no match prompts for one of following - Retype Delete Edit Add new lookup code

Indexes

SQL tabfiles Stuck with Dummy case until

SQL tabfiles SAVE TABLE SQL indexes

One external file could hold lookups

Indexes

SQL tabfiles One external file – many advantages

One multi-user file for all lookups Simplified maintenance - one file to backup Multiple indexes per table ( cf. record ) Max Key Size problem removed

BUT ……………

Indexes

SQL tabfiles - problems Can only see the ‘whole’ file picture

No equivalent of File dump / Add recs Need SQL+ - clumsy PQL programs - not intuitive Could use Forms but not easy

Journalling was suspect Ended up using EXPORT for backup Exporting SQL tabfiles was idiosyncratic

Indexes

SQL tabfiles - problems If tabfile is ‘volatile’

– small frequent updates Tabfile can get ‘corrupt’ Verify ‘drops’ table instead of correcting

Updates appeared to work but later find records missing / corrupt

Just had to ‘live with it’

……… and hope for better

Indexes

Second data base No worries about Max Key Size Multi user Reliable DBMS utilities and functionality

UNLOAD Journalling File Dump / Add Recs Easily look at the data

But ……………………

Indexes

Second data base Considered but not used Why ?

No alternative ‘views’ / indexes Back to two copies of dataRECORD SCHEMA 99, ICD2NAMESORT IDS ICDCODE

RECORD SCHEMA 98, NAME2ICDSORT IDS ICDNAME

No real advantages – ‘devil you know’

Indexes

What do these have in common ?

Still puzzled ??

Indexes

Two data bases + Secondary Index Then in SiR2002

Two, or more, data bases Secondary Indexes LOOKUP command

Decided to ‘go for broke’ Use all three ‘features’ to remove

previous problems

Indexes

Two data bases + Secondary Index Two things spring to mind

Can of worms Snail in a well

Indexes

Two data bases + Secondary Index The can of worms was trying to

understand code written many years ago

How many of you have ‘revisited’ PQL you wrote 5 years ago ?

Can you remember what you were trying to do ?

Indexes

Two data bases + Secondary Index Do you add Comments to explain for

future benefitC Only get records for Males over 60

Use | for ‘inline’ comments. END PROCESS REC | PREGNANCY

It was ‘grey hair’ time – yet again !! Not sure which is worse

teenage daughters old SiR code

Indexes

Two data bases + Secondary Index By hard work and perseverance

managed to ‘decode’ the old code This time I added Comments as I

worked it out !! So that was the can of worms sorted

Indexes

Two data bases + Secondary Index Just left the snail in the

well How does this relate to

SiR code

Climb 3 feet during day – slip back two feet at night

At almost every stage I encountered ASFs

‘Another SiR Feature’

Indexes

Two data bases + Secondary Index Enormous thanks to Tony for help, aid,

assistance and patience Six new versions of SiR within one

week Gradually felt that I was ‘climbing’

higher and higher

Indexes

Two data bases + Secondary Index One day after yet another new version -

programme ran OK Message to Tony

“I’m out of the well !! “ Even so did some more testing

Indexes

Two data bases + Secondary Index

Two hours later I sent a message using a phrase from this show

“I’ve fallen in the water”

Yet another problem

Indexes

Two data bases + Secondary Index Yet again Tony came to the rescue Apart from one bit of my code I need to

sort the programmes are working OK So how do we use

A second data base Secondary Indexes The LOOKUP command

Indexes

Two data bases + Secondary Index All lookups are held in a second,

caseless data base The key fields are the numeric codes to

keep MAX KEY SIZE to minimum Journalling is turned on

Indexes

Two data bases + Secondary Index The same code might refer to multiple

text strings Threatened Abortion Thr Abr TAbr

All mean same Use AUTOKEY to cope Secondary indexes on the text strings

Indexes

Two data bases + Secondary Index Experiences so far

Still testing but looks good Faster and more reliable Can look at the data easily Correcting invalid data is easy All the power and features of vPQL and

DBMS utilities for maintenance Only one system to learn / remember

Indexes

Lookups – a summary Dummy case

Rigid Inflexible Cumbersome Can use DBMS etc. for maintenance

Now luckily replaced

Indexes

Lookups – a summary SQL tabfiles

Very flexible Obtuse No easy maintenance SQL+ is cumbersome Reliability / integrity of tabfiles

Better but for long term – flawed Ad-hoc work – re-building every time

Indexes

Lookups – a summary Final solution with Second data base,

Indexes & LOOKUP Reliable PQL is familiar Fast Combines good features of dummy case

and tabfiles Perhaps time will tell – but looks good

Documents

Lookups or “ the Good, the Bad and the Ugly ” John S. Lemon