Upload
nerita
View
32
Download
0
Embed Size (px)
DESCRIPTION
Lookups or “ the Good, the Bad and the Ugly ” John S. Lemon. What do these have in common ?. One person knows – rest of you will have to wait !. What I hope to cover. Based on the Aberdeen Maternity and Neo-natal Data Bank ( AMNDB ) History of lookups – Dummy case SQL tabfiles - PowerPoint PPT Presentation
Citation preview
Indexes
Lookupsor
“ the Good, the Bad and
the Ugly ”
John S. Lemon
Indexes
What do these have in common ?
One person knows – rest of you will have to wait !
Indexes
What I hope to cover Based on the Aberdeen Maternity and
Neo-natal Data Bank ( AMNDB ) History of lookups –
Dummy case SQL tabfiles Second data base Second data base + secondary indexes +
lookup command
Indexes
What is a “lookup” Converting text -> numeric code Or numeric code -> text Data entry staff enter
‘uterine adhesions’ Converted to ( and stored as )
621.5 Why ?
Space
Indexes
Why use a ‘lookup’ Alternatives are storing full text ‘uterine adhesions’
- stored as 19 bytes / characters This is one of the shorter ones !!!
Problem of different ‘spelling’ Uterine Adhesions UTERINE ADHESIONS etc.
Which to use when searching
Indexes
Why use a ‘lookup’ Store ‘uterine adhesions’ as
621.5
- stored as 8 bytes less if using integers or fixed length string codes ‘A6215’
- stored as 7 bytes
Indexes
Why use a ‘lookup’ Value labels - not practical
Size limit on label length Record specific Problems managing an increasing list Ordering / Sorting Large numbers of codes:text pairs
Occupations 13872Drugs 5357Operations 2754Diseases / Illnesses / Complications 4404
Indexes
Dummy case Only option in version 2 Use a CASEID which was unique
In AMNDB used –999999 This case only contained data for the
‘lookup’ record types It worked but ….
Indexes
Dummy case - problems Specific to that data base
Not shared – unless SIR FILE DUMP - from ‘active’ READ INPUT DATA - to ‘research’
Have to repeat for all data bases that need lookups
Indexes
Dummy case - problems Can only go ‘one way’ unless duplicate
data with different Key Fields
RECORD SCHEMA 99, ICD2NAMESORT IDS ICDCODE
RECORD SCHEMA 98, NAME2ICDSORT IDS ICDNAME
Huge problems of maintenance
Indexes
Dummy case - problems MAX KEY SIZE
May have seen in LIST STATS What is it ?
Need to have quick look at the ‘structure’ of a SiR record Simplified not definitive view If you want a complete definition –
ask Tony !!
Indexes
SiR record structure Essentially two components
Key - fixed length - same for ALL records May not be ‘filled’ for all records
Data - variable length
1
2
3
CASEID
CASEID
CASEID
Max Key Size
Max Key Size
Max Key Size
Data
Data
Data
Actual Size
Actual Size
Indexes
Dummy case - problems Without dummy case
Max Key Size - approx 20 bytes long Text field – 50 characters long
Max Key Size - approx 60 bytes long Extra 40 bytes For EVERY record 1.5 million * 40 bytes = approx 60 Mbytes Trivial today but in 1988 ……………
Indexes
Dummy Case – solutions (sic) How to resolve size problems
Reduced length of text field – 30 chars One way only Only in one data base – data entry Used ‘marvellous feature’ of sequential
data base access SIR3 file held on tape(s) Only PROCESS CASES ALL No CASE IS
Indexes
Lookups use in AMNDB Two main programmes / suites of
programmes Batch run to convert text -> code Interactive programme for each record that
requires lookups Need both for different reasons
Indexes
Lookups use in AMNDB Batch run to convert text -> code Does all record types in data base that
has data for looking up Only converts records where text code
is already available Marks successful lookups Leave unsuccessful ones for interactive
Indexes
Lookups use in AMNDB One interactive programme per record
with data for looking up Functions in same way as batch except
If no match prompts for one of following - Retype Delete Edit Add new lookup code
Indexes
SQL tabfiles Stuck with Dummy case until
SQL tabfiles SAVE TABLE SQL indexes
One external file could hold lookups
Indexes
SQL tabfiles One external file – many advantages
One multi-user file for all lookups Simplified maintenance - one file to backup Multiple indexes per table ( cf. record ) Max Key Size problem removed
BUT ……………
Indexes
SQL tabfiles - problems Can only see the ‘whole’ file picture
No equivalent of File dump / Add recs Need SQL+ - clumsy PQL programs - not intuitive Could use Forms but not easy
Journalling was suspect Ended up using EXPORT for backup Exporting SQL tabfiles was idiosyncratic
Indexes
SQL tabfiles - problems If tabfile is ‘volatile’
– small frequent updates Tabfile can get ‘corrupt’ Verify ‘drops’ table instead of correcting
Updates appeared to work but later find records missing / corrupt
Just had to ‘live with it’
……… and hope for better
Indexes
Second data base No worries about Max Key Size Multi user Reliable DBMS utilities and functionality
UNLOAD Journalling File Dump / Add Recs Easily look at the data
But ……………………
Indexes
Second data base Considered but not used Why ?
No alternative ‘views’ / indexes Back to two copies of dataRECORD SCHEMA 99, ICD2NAMESORT IDS ICDCODE
RECORD SCHEMA 98, NAME2ICDSORT IDS ICDNAME
No real advantages – ‘devil you know’
Indexes
What do these have in common ?
Still puzzled ??
Indexes
Two data bases + Secondary Index Then in SiR2002
Two, or more, data bases Secondary Indexes LOOKUP command
Decided to ‘go for broke’ Use all three ‘features’ to remove
previous problems
Indexes
Two data bases + Secondary Index Two things spring to mind
Can of worms Snail in a well
Indexes
Two data bases + Secondary Index The can of worms was trying to
understand code written many years ago
How many of you have ‘revisited’ PQL you wrote 5 years ago ?
Can you remember what you were trying to do ?
Indexes
Two data bases + Secondary Index Do you add Comments to explain for
future benefitC Only get records for Males over 60
Use | for ‘inline’ comments. END PROCESS REC | PREGNANCY
It was ‘grey hair’ time – yet again !! Not sure which is worse
teenage daughters old SiR code
Indexes
Two data bases + Secondary Index By hard work and perseverance
managed to ‘decode’ the old code This time I added Comments as I
worked it out !! So that was the can of worms sorted
Indexes
Two data bases + Secondary Index Just left the snail in the
well How does this relate to
SiR code
Climb 3 feet during day – slip back two feet at night
At almost every stage I encountered ASFs
‘Another SiR Feature’
Indexes
Two data bases + Secondary Index Enormous thanks to Tony for help, aid,
assistance and patience Six new versions of SiR within one
week Gradually felt that I was ‘climbing’
higher and higher
Indexes
Two data bases + Secondary Index One day after yet another new version -
programme ran OK Message to Tony
“I’m out of the well !! “ Even so did some more testing
Indexes
Two data bases + Secondary Index
Two hours later I sent a message using a phrase from this show
“I’ve fallen in the water”
Yet another problem
Indexes
Two data bases + Secondary Index Yet again Tony came to the rescue Apart from one bit of my code I need to
sort the programmes are working OK So how do we use
A second data base Secondary Indexes The LOOKUP command
Indexes
Two data bases + Secondary Index All lookups are held in a second,
caseless data base The key fields are the numeric codes to
keep MAX KEY SIZE to minimum Journalling is turned on
Indexes
Two data bases + Secondary Index The same code might refer to multiple
text strings Threatened Abortion Thr Abr TAbr
All mean same Use AUTOKEY to cope Secondary indexes on the text strings
Indexes
Two data bases + Secondary Index Experiences so far
Still testing but looks good Faster and more reliable Can look at the data easily Correcting invalid data is easy All the power and features of vPQL and
DBMS utilities for maintenance Only one system to learn / remember
Indexes
Lookups – a summary Dummy case
Rigid Inflexible Cumbersome Can use DBMS etc. for maintenance
Now luckily replaced
Indexes
Lookups – a summary SQL tabfiles
Very flexible Obtuse No easy maintenance SQL+ is cumbersome Reliability / integrity of tabfiles
Better but for long term – flawed Ad-hoc work – re-building every time
Indexes
Lookups – a summary Final solution with Second data base,
Indexes & LOOKUP Reliable PQL is familiar Fast Combines good features of dummy case
and tabfiles Perhaps time will tell – but looks good