13
WEEK 2: INTRODUCING THE RELATIONAL APPROACH TO DATA Data organisation and some basic relational ideas In this week, we will cover the following topics: Data: that you can design the structure of data, including the relationships between data. The important of being able to uniquely identify datum within a set of data. How to accumulate relationships between data, as data, stored in tables. A brief look at a real word set of data known as Codepoint, and the limitations. ‘Programming’: the distinction between declarative programming and algorithmic … and will result in the following learning outcomes: An initial appreciation that it is good to be systematic in how data is represented. That data ‘keys’ allow us to access specific datum. Knowledge that tables can be used to store data and relationships between data. That real-word data is available, but not necessarily perfectly organised. A feeling for the kind of programming relevant for database interactions.

Data organisation and some basic relational ideas3 WEEK 2: INTRODUCING THE RELATIONAL APPROACH TO DATA 2.1 Designing for data 2.1.1 Relationships between data/datum During week 1,

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data organisation and some basic relational ideas3 WEEK 2: INTRODUCING THE RELATIONAL APPROACH TO DATA 2.1 Designing for data 2.1.1 Relationships between data/datum During week 1,

WEEK2:INTRODUCINGTHERELATIONALAPPROACHTODATA

Data organisation and some basicrelationalideasInthisweek,wewillcoverthefollowingtopics:

• Data:thatyoucandesignthestructureofdata,includingtherelationshipsbetweendata.

• Theimportantofbeingabletouniquelyidentifydatumwithinasetofdata.• Howtoaccumulaterelationshipsbetweendata,asdata,storedintables.• AbrieflookatarealwordsetofdataknownasCodepoint,andthelimitations.• ‘Programming’:thedistinctionbetweendeclarativeprogrammingandalgorithmic

…andwillresultinthefollowinglearningoutcomes:

• Aninitialappreciationthatitisgoodtobesystematicinhowdataisrepresented.• Thatdata‘keys’allowustoaccessspecificdatum.• Knowledgethattablescanbeusedtostoredataandrelationshipsbetweendata.• Thatreal-worddataisavailable,butnotnecessarilyperfectlyorganised.• Afeelingforthekindofprogrammingrelevantfordatabaseinteractions.

Page 2: Data organisation and some basic relational ideas3 WEEK 2: INTRODUCING THE RELATIONAL APPROACH TO DATA 2.1 Designing for data 2.1.1 Relationships between data/datum During week 1,

2

WEEK2:INTRODUCINGTHERELATIONALAPPROACHTODATA

TableofContents2.1 Designingfordata..........................................................................................................3

2.1.1 Relationshipsbetweendata/datum.................................................................................32.1.2 Twoinitialproblems.........................................................................................................4

2.2 Relationalsolutionstotheproblems..............................................................................52.2.1 Uniqueidentification........................................................................................................52.2.2 Accumulatingrelationaldata............................................................................................7

2.3 Areal-world,file-baseddataset:CodePoint@Open......................................................82.3.1 AcritiqueofCodePoint@Open......................................................................................11

2.4 ‘Declarative’programingv‘Algorithmic’Programming................................................122.5 Summary.....................................................................................................................13

Figure-1:Basicrelationshipsseenbothways...........................................................................3Figure2:Websitemock-upexamplesfor‘newcustomer’websiteform................................3Figure-3:Aviewofwhatarelationaltablerepresents...........................................................8Figure-4:Structureofcodepointopendirectories..................................................................9Figure-5TheGCUpostcode...................................................................................................11Table1:Accumulatedcustomers.............................................................................................4Table2:CustomertableversionB–non-uniquenames.........................................................4Table3:Products......................................................................................................................5Table4:CustomertableversionC...........................................................................................6Table5:Somekey,definitionsregardingrelationaldatabases................................................7Table6:Orderedtable.............................................................................................................7Table7:Non-verboseCode-Point®Openheader/columnfields.............................................9Table8:VerboseCode-Point®Openheader/columnfields...................................................10Table9:Extractfromfile<codepo_gb/Data/CSV/g.csv>.......................................................10Table10:Applicationprogrampseudocodetoretrievedatafromfile.................................12Table11:Codetoretrievedatafromfilevdatabasedeclaration..........................................13

Page 3: Data organisation and some basic relational ideas3 WEEK 2: INTRODUCING THE RELATIONAL APPROACH TO DATA 2.1 Designing for data 2.1.1 Relationships between data/datum During week 1,

3

WEEK2:INTRODUCINGTHERELATIONALAPPROACHTODATA

2.1 Designingfordata2.1.1 Relationshipsbetweendata/datumDuring week 1, we discussed the idea of a ‘legacy’ datamanagement system, seen as aconsequenceoftherebeingnoenforcedrulesfordataentries.Theorder(wehaveseenthatthishasimportantaspectstoitconcerningtheactualdelivery)hasother important informationattached to it.Namely, theorder relates a product to acustomer. We might think of this situation, then, as two physical entities (Product andCustomer)beingconnectedbyan‘event’,the‘order’:

• Customer‘ordered’Product.

• Product‘orderedby’Customer.

Figure-1:Basicrelationshipsseenbothways

Consideranassociateddataentryexample,notfordataenteredintoaspreadsheet,thistime,butthroughawebsite;herewemightimaginearetailerwebsiteformdesignedtoreceivedataconcerningthecustomer.

Figure2:Websitemock-upexamplesfor‘newcustomer’websiteform.

Wepresenttwochoices(seeFigure2).Thefirstchoiceisdesignedtoreceive5entries,i.e..,FirstName, Surname,Address,E-mail, andTel (short for telephone number). The secondchoiceisdesignedtoreceive6entries, i.e.,FirstName,Surname,Address,E-mail,Tel,andProduct.Wedonotneedtoquestionthedesignoftheformsthemselves.Theforms,forourcurrentpurposes,merelyprovideaninterfacethroughwhichdataisinputto‘thesystem’.Letustakealookatwhattheactualdatamight‘looklike’(instorage),afteracompanyhashadanumberofnewcustomersadded.Wewillassumethatthesecondformwasadopted

Product Customerordered by

ProductCustomerordered

First Name:

Surname:

Address:

E-mail:

Tel:

https://your_company.com/new_customer

First Name:

Surname:

Address:

E-mail:

Tel:

https://your_company.com/new_customer

Product:

Add Customer Add Customer

Page 4: Data organisation and some basic relational ideas3 WEEK 2: INTRODUCING THE RELATIONAL APPROACH TO DATA 2.1 Designing for data 2.1.1 Relationships between data/datum During week 1,

4

WEEK2:INTRODUCINGTHERELATIONALAPPROACHTODATA

bythecompany,theonewithallofcustomerdetails,inadditiontothefirstcustomer-chosenProduct.First_Name Surname Address E-mail Tel Product

John Davis SalmonRoad [email protected] 077**08 SoapPowder

Aileen McManus EastAvenue [email protected] 0131**9 TableDavid Smith NorthClose [email protected] 0141**1 WoodPolishSarah Jones QueenRoad [email protected] 0131**9 Nails

Table1:Accumulatedcustomers.

Fromthepointofview,acustomer’sfirstorder,maybethiskindofdatamakessense?Ifyoulookonmostretailwebsites,however,youwillfindthatthecustomerisrequiredtoregisteranaccountseparatelytotheorder.Nevertheless,aswementioned,wearenotconcernedhere about the design of the web interface, although the following issues are worthhighlightinginrelationtheaboveexample:

1. What happens if a customer, named John Davis, registers himself, ordering theproduct Soappowder, as above, then threemonths later somebody elsewith thesamenameregistersandordersaBike?

First_Name Surname Address E-mail Tel ProductName

John Davis SalmonRoad [email protected] 077**08 SoapPowderAileen McManus EastAvenue [email protected] 0131**9 Table,TVDavid Smith NorthClose [email protected] 0141**1 WoodPolishSarah Jones QueenRoad [email protected] 0131**9 NailsJohn Davis WestClose [email protected] 077**08 Bike

Table2:CustomertableversionB–non-uniquenames.

2. HowwelldoestheProductcolumn‘cover’therelevantdatanowandinthefuture?Point1:Assumingthatthisdoeshappen,ifoneofthe‘JohnDavis’customersthenordersaTable,howdoweknowwhichJohnDavisthisis?Perhapswecanidentifythemuniquelybytheirphonenumberore-mailaddress?Point2:Ifwelookagainatthetable,duringthetimethesecondJohnDavisregistered,AileenMcManus ordered a TV. Is storing the collection of products in this way acceptable? Itcertainlydoesnotseemconsistent.TheoriginalmeaningoftheProductcategorywasinitialProduct.2.1.2 TwoinitialproblemsAlthoughwearenowlookingatadifferentexample,oneinvolvingahypotheticalweb-basedsystem,wearestuckinthesamesortoffile-based,spreadsheetmind-setintroducedinweek1,butwehaveintroducedtwonewproblems:

1. Theproblemofallowinguniqueaccesstoasetofdata.

Page 5: Data organisation and some basic relational ideas3 WEEK 2: INTRODUCING THE RELATIONAL APPROACH TO DATA 2.1 Designing for data 2.1.1 Relationships between data/datum During week 1,

5

WEEK2:INTRODUCINGTHERELATIONALAPPROACHTODATA

2. Theproblemofhowtobestrepresentaccumulativechanges/additionstodata,butwithoutallowingstructuralchangestobemadei.e.,withoutallowingchangetotheoverallstructureofthedataitself.

2.2 RelationalsolutionstotheproblemsAsawayofintroducingsomeofthebasicideasofrelationaldatabases,wewillnowre-designhowourdataisstored,expandingtheexamplesabove,toaddressourtwoinitialproblems.Let us make the data a little bit more believable, even if not necessarily ‘realistic’1. Forexample,aretailerislikelytohavealargersetofdatathatrelatestoaspecificProducts;whatisimportantfromacustomer-facingpointofview–thenameoftheproduct–wouldthenbeonecolumninthislargersetofvalues.Examplesofwhatmightbeimportanttotheretailer:

• Wheretheproductcomesfrom–theSupplier.• Theunitcostofpurchase–PurchaseCost.• Theunitpriceatpointofsale–SalePrice.• Productcode–ProductCode.• Productcategory–Category.• Productname–Product.

ProductName Category ProductCode SalePrice PurchaseCost SupplierSoapPowder Home 02147698 4.50 2.50 15

TV Electrical 05070012 999.99 650.00 05Table Furniture 08608376 299.00 100.00 09

WoodPolish Home 02447701 2.50 1.00 11Nails Hardware 01570627 3.00 50.0 15Bike Leisure 09377700 680.00 400.00 155

Table3:Products

2.2.1 UniqueidentificationWearenowgoingtore-defineTable2(CustomertableversionB).Wedothistoprovideauniqueidentifierforeachcustomer,whichwewillcallCustomerId,andwemaintainalloftheothercolumnsasdefinedpreviously,inTable4(assumethattheE-mailentriesexist–theyareomittedforeaseofvisualization).CustomerId First_Name Surname Address E-mail Tel ProductName

1Real-worlddataisofteninappropriatetouseinlecturenotesandtextbooksandmostofthetimeyouwillbelookingatillustrativedataonly.However,lateroninthecourse,especiallyintheaccompanyingtutorialswewilltrytousesomerealdata-sets.

Page 6: Data organisation and some basic relational ideas3 WEEK 2: INTRODUCING THE RELATIONAL APPROACH TO DATA 2.1 Designing for data 2.1.1 Relationships between data/datum During week 1,

6

WEEK2:INTRODUCINGTHERELATIONALAPPROACHTODATA

0001 John Davis SalmonRoad

. 077**08 SoapPowder

1298 Aileen McManus EastAvenue

. 0131**9 Table,TV

0032 David Smith NorthClose

. 0141**1 WoodPolish

0099 Sarah Jones QueenRoad

. 0131**9 Nails

9834 John Davis WestClose

. 077**08 Bike

Table4:CustomertableversionC

YoumighthavenoticedapotentialproblemwiththeProductNAMEcolumninTable2andTable5.Again,thereispotentialfortheproductsto‘overlap’intermsoftheProductNamecolumn – the names are, again, too generic to act as unique identifiers. For example, acustomermightchoosebetweenmanydifferentkindsofSoapPowder,so,wecannotusesuchproductnamesasuniqueidentifiers(theyarenotunique!).Therearetworulesforuniqueidentifier’s:

• Theymustbeunique!

• Theyshouldnotchangevalue!o This iswhy it isabad idea touseaphonenumbersore-mail addressesas

uniqueidentifiers.Therefore, in order to uniquely identify a customer, we use theCustomerId. In order touniquelyidentifyaproduct,weusetheProductCode.Theuniqueidentifierissimplyavaluethatisusedthatisuniquetoagivenrow(rowswithinagiventablecannothavethesameuniqueidentifier).Thephrase‘uniqueidentifier’anaccuratedescription,isalsoreferredtoasthe‘primarykey’becauseitprovidesprimarymeansofrow-wiseaccesstodata.Relationaldatabases:keydefinitions

• Schema: theentire structureanddescriptionofdata. Importantly, the schema isthoughtof,andpackaged,aspartofthedatabase.Inthiswayadatabaseis‘self-describing’.

• Primarykey:thecolumnattributeofatable,whoserowvalueisuniqueinthesetofrows.

• Table:theaxiomaticrepresentationalformemployedinrelationaldatabases• Row:accumulative• Columns:schematic• Relations: typically, the ‘links’ made between datum. In the area of relational

databases,collectionsofsuchrelationsarerepresentedastables.• Tuples:asetofdatathatisorderedaccordingtothecolums.Anygivenrowdefines

atuple.Ifthenumberofcolumsisn,theneachrowisann-tuple.So,eachrowinTable4isan7-tuple,whereaseachrowinTable3isan6-tuple.

Page 7: Data organisation and some basic relational ideas3 WEEK 2: INTRODUCING THE RELATIONAL APPROACH TO DATA 2.1 Designing for data 2.1.1 Relationships between data/datum During week 1,

7

WEEK2:INTRODUCINGTHERELATIONALAPPROACHTODATA

• Attributes:Refertothesetofcolumnsinatable;thesearethusthetablesattributesandatthesametimetheattributesofeachrowofdata.

• Entitytype:thespecificnamegiventoanentity,whichisessentiallyacollectionofcolumns/tupleetc.,associatedwiththattype.

Table5:Somekey,definitionsregardingrelationaldatabases.

2.2.2 AccumulatingrelationaldataIntheprevioussubsectionwesolvedtheproblemofuniqueidentificationofdatabydefining,andgivinganexampleof,theuseofprimarykeys.Nowwewillshowhowtostorerelationsbetweendatainthestyleoftherelationalapproach.

CustomerId ProductCode Date0001 02147698 02-01-20169834 09377700 02-02-20160032 02447701 23-02-20161298 08608376 05-03-20161298 05070012 05-03-20160099 01570627 21-03-20160032 08608376 19-03-20160032 01570627 19-03-2016. . .

Table6:Orderedtable.

Note that the secondproblemof storingnumerousproducts per-customer is, also, easilysolvedbythistable;wesimplyaddanewroweachtimeanorderisplaces–noneedtoaddthenewproducttothesamerowasotherproducts.Accumulativechangesarerepresentedwithoutstructuralchangesbeingmadetothetable,suchasaddinganewcolumnforanewproduct.InFigure-3wepresent(ontheleft)arepresentationoftherelationshipsspecifiedbyTable6,alongwith the tables thatcontain theprimarykeys.Notice that thecircles inFigure -3contain the values of attributes pertaining to the primary keys. In otherwords,we havechosenareadable-friendlyattributeofeachrowindicatedbytheprimarykey,ratherthantorepresenttheprimarykeysthemselves.Thisisalittlebitlikesaying:

• Select from the Customer table the First and Second Name according to theCustomerIdsintheOrdertable.

Page 8: Data organisation and some basic relational ideas3 WEEK 2: INTRODUCING THE RELATIONAL APPROACH TO DATA 2.1 Designing for data 2.1.1 Relationships between data/datum During week 1,

8

WEEK2:INTRODUCINGTHERELATIONALAPPROACHTODATA

Figure-3:Aviewofwhatarelationaltablerepresents.

2.3 Areal-world,file-baseddataset:CodePoint@OpenInsteadofworkingwith‘fake’dataforillustrativepurposes,inthissectionwearegoingtotakea lookatareal-worlddataset.Thedataset isthesetofUKpostcodedataknownasCode-Point® Open (Ordinance Survey, 2015). Postcodes are grouped combinations ofnumbersandletters,whichareassociatedwithapostalarea.Anexamplepostcodeis:

G40BAThe‘G’partofthepostcodestandsforGlasgow.The‘4’partofthepostcoderepresentsanareainGlasgow,andthe0representsanareawithintheareaofarea‘4’.So,aswereadacrossthe string G40BAwe effectively ‘zoom-in’ to quite a small region. Postcodes are used todeliverletters,andareusedtosortmail,tomakethedeliveryofmailmoreefficient.ImagineifapostmanweretorandomlydeliverlettersintheUK.Theywould,forexample,deliveraletter in Birmingham, followed by one in Aberdeen, some 680 kilometers North! So,postcodes help sortmail,which in turn helps organize its efficient delivery. A list of thegeneralpostcodeareasisavailableonwikipedia.org:https://en.wikipedia.org/wiki/List_of_postcode_districts_in_the_United_KingdomWewillseelaterinthecoursethatpostcodes,andtheassociateddatawithinCode-Point®Open,alongwithsomeothertechnologyweintroducelater,canbeusefulforotherthings,too. Fornow,though,wewanttofocusonthedata,toseehowitisstructuredandtoseewhatwemightwanttodotoputitintoadatabase.Firstly,Code-Point®Opencanbedownloadedfrom:https://www.ordnancesurvey.co.uk/opendatadownload/products.htmlCode-Point®Open(OrdinanceSurvey,2015)isanalmost-comprehensivelistoftheentiresetof postcodes in the UK. That is, it has thewhole of the UK’s postcodes listed, excludingNorthernIreland.Thedatabasecontainsapproximately1.7millionUKpostcodes,withsome

John Davis

Soap Powderordered

02-01-2016

TableAileen McManus ordered

(05-03-2016)

TV

ordered(05-03-2016)

David Smith

Sarah Jones

Nails

Wood Polish

ordered(21-03-2016)

John Davis Bike

ordered02-02-2016

ordered(23-02-2016)

ordered(19-03-2016)

ordered(19-03-2016)

Page 9: Data organisation and some basic relational ideas3 WEEK 2: INTRODUCING THE RELATIONAL APPROACH TO DATA 2.1 Designing for data 2.1.1 Relationships between data/datum During week 1,

9

WEEK2:INTRODUCINGTHERELATIONALAPPROACHTODATA

25 million adjoining addresses; therefore, each separate postcode serves about 15 postaddress,onaverage(15x1.7=approx.25).

Figure-4:Structureofcodepointopendirectories.

ThedataisdistributedbytheOrdinanceSurveyasasetoffiles.InFigure-4,wepresentthestructureofthefoldersandwherethedatafilesetc.canbefound.Thetop-level folder is<codepo_gb>,whichcontainstwosubfolders,<Doc>and<Data>,andinside<Data>wehavethefolder<CSV>.Weareonlyinterestedinthe‘.csv’files.The‘.csv’postfixstandsforcommaseparatedvalues,and‘.csv’filesaretypicallytextfiles,whichcanbeopenedinanystandardtexteditor.Mostofthefilesarecontainedinthe<CSV>directory:

• codepo_gb/Data/CSV/*.csv:thissetof120files(thethreeasterisks‘*’ inFigure-4hide102filesthatfit,alphabetically,between<ca.csv>,<wn.csv>)containstheactualdata.

Butthereisanother.csvfilehere:

• codepo_gb/Doc/Code-Point_Open_Column_Headers.csv:thissinglefiledefinestheheaderstothedata

Giventhestructureofsomeofthepublicdatathatisavailable(whichcanbeverymessy),weshouldbereasonablyhappywiththisfile-basedorganizationofCode-Point®Open.Ofcourse,thereistheproblemthatthedataisstoredintofiles,andifwewantedtocreateaprogramthatneededtousepostcodes,thefileswouldtakealongtimetoreadintomemory.However,thefilesarerelativelywellorganised.

PC PQ EA NO CY RH LH CC DC WCTable7:Non-verboseCode-Point®Openheader/columnfields.

codepo_gb

DataDoc

CSV

***

Page 10: Data organisation and some basic relational ideas3 WEEK 2: INTRODUCING THE RELATIONAL APPROACH TO DATA 2.1 Designing for data 2.1.1 Relationships between data/datum During week 1,

10

WEEK2:INTRODUCINGTHERELATIONALAPPROACHTODATA

Postcode Positional_quality_indicator Eastings Northings CY RH LH CC DC WCTable8:VerboseCode-Point®Openheader/columnfields.

Letustakeacloserlookattheheaderfile.Thisfilecontains10‘columns’and2‘rows’.Inthetextfile,thewaythatthecolumnsarecodedarebytheinsertionofthecomma‘,’.Therowsarejustspecifiedbybeingplacedonseparatelines,usingonthekeyboardthe<Enter>key.Theway ‘enter’ iscoded ina text file isbyusing ‘/n’ (this isnotnecessarilyvisible in texteditors), which is otherwise known as the carriage return. The header/column fields arepresented in a relatively non-verboseway in Table 7,whereas Table 8 contains relativelyverbose labels. Don’t worry at the moment what these fields mean – we are not eveninterestedinknowingwhatallofthismeans,butwewilllookatthemostimportant(fromourpointofview)fieldsinanexampleofthedata.So,thefirstcolumnscontaintheactualpostcode.Then,there issomethingsknownasthepositionalqualityindicator,followedbyentriesfortheeastingsandnorthings.LetustakealookatanextractfromadatafileseeinTable9."G40AJ",10,260044,665214,"S92000003","","S08000021","","S12000046","S13002649""G40AL",10,260044,665214,"S92000003","","S08000021","","S12000046","S13002649""G40AN",10,259725,664965,"S92000003","","S08000021","","S12000046","S13002649""G40AP",10,260044,665214,"S92000003","","S08000021","","S12000046","S13002649""G40AQ",10,260044,665214,"S92000003","","S08000021","","S12000046","S13002649""G40AX",10,259349,666274,"S92000003","","S08000021","","S12000046","S13002650""G40BA",10,259296,666080,"S92000003","","S08000021","","S12000046","S13002650"

Table9:Extractfromfile<codepo_gb/Data/CSV/g.csv>

Lookathowdesignersofthedatahavedecidedtorepresentdifferenttypesofdatawithineachcolumn.TheydefineaStringtypeinthefilebyenclosingwithdoublequotations“”.Forexample,theright-handcolumninthefinalrow(SI3002650)isaStringtypebyvirtueofthefactthatitisrepresentedbetweenthecommasas“SI3002650”.Ontheotherhand,numbersarestoredasis,withoutbeingenclosedwithquotes.Letus takea lookatsomeexamplenumbersandwhat theyrepresent, taking theyellow-highlighted lineasanexample.Thisrow is theentry fortheG40BApostcode,mentionedabove, which is the postcode for Glasgow Caledonian University. The ‘positional qualityindicator’(column2)hasavalueof10,the‘easting’(column3)avalueof259296,andthe‘northing’(column4)avalueof666080.Eastingsandnorthingsareaformofgeographicalcoordinate.Therefore,alocationonthesurfaceoftheearthcanbefoundwithan<easting,northing>pair.Soletuscheckhowaccuratetheeastingnorthingsdatais.Perhapswecanusealeadingmapsprovidertodothis?Notnecessarily!IfweplugtheseintoGoogleMapswewillgetanerror,becauseGooglemaps(andmanyotherwebservicescometothat)representgeocoordinatesdifferently, using the Latitude and Longitude. Nevertheless, it is possible to check theinformation by using a web site (http://www.gridreferencefinder.com) coded to handle<easting,northing>-stylecoordinates.Ifyougotothissiteandenterthevalues,theapinwillbe dropped inside the boundary of theGCU campus! Indeed, aswe see in Figure -5, the

Page 11: Data organisation and some basic relational ideas3 WEEK 2: INTRODUCING THE RELATIONAL APPROACH TO DATA 2.1 Designing for data 2.1.1 Relationships between data/datum During week 1,

11

WEEK2:INTRODUCINGTHERELATIONALAPPROACHTODATA

coordinate<259296,666080>issituatedataspecificpointwithinthecampus,actuallyjustnexttotheSaltireCentre,whichisoneoftheuniversitiesfocalpointsforlearning.

Figure-5TheGCUpostcode.

2.3.1 AcritiqueofCodePoint@OpenWe are not interested here in critiquing the actual raw data as a resource. As we havementioned,theresourceisveryvaluableinitself.Wedo,however,wanttobrieflyreflecton:thefolder/filestructureoftheresource,andthestructureoftherawdataitself.Strictlythereisno‘right’waytostructureadataresource,butyoushouldtryandnoticesomeinconsistencieswiththeorganizationofthefiles.TakealookagainatFigure-4.Forexample:

1. Reflectingonthefolder/filestructureoftheresource:a. Thefile<codepo_gb/Doc/Code-Point_Open_Column_Headers.csv>contains

theheadingsofthedata.Shouldthisnotbepartofthedata itself,perhapskeptinsidethecodepo_gb/Datafoldersomewhere?Furthermore,thisfilehasan‘.csv’extension.Why,then,isitnotintheCSVdirectory?

b. Metadata is data about data. It is, therefore, data. Why iscodepo_gb/Doc/metadata.txtnotintheDatadirectorythen?

c. AllofthefilesintheDatadirectoryare‘.csv’typefiles.Isthenamingofthisfolderredundant?

2. Thestructureoftherawdataitself:

a. Asmentioned,Stringsrequireadditionalcoding(“String”)withinthedatafileitself!Shouldn’tthetypeofthedatabedeclaredoutsideoftheactualrawdataentries?Theanswertothatisyes.

b. Howdowewriteanapplicationprogramtogetaspecificdataentry,orsetofentries,outofafile,orsetoffiles?Andhowdowedothisquickly?Theanswertothisisthatitcanbequitecumbersometocodeaccesstospecificvalues.Thealgorithmsrequiredtodothisneedcoding in theapplication.Furthermore,readingfilesfromdiskisinherentlyasslowprocedure.

http://www.gridreferencefinder.comcodepo_gb/Data/CSV/*.csv

Page 12: Data organisation and some basic relational ideas3 WEEK 2: INTRODUCING THE RELATIONAL APPROACH TO DATA 2.1 Designing for data 2.1.1 Relationships between data/datum During week 1,

12

WEEK2:INTRODUCINGTHERELATIONALAPPROACHTODATA

Manyoftheissuewementionhere,concerningfile-basedrepresentations,simplyseemtodisappearwhenusingdatabasesbecauseweareworkingwithinthewell-definedconstraintsofadatabase.2.4 ‘Declarative’programingv‘Algorithmic’ProgrammingBeforewefinishthisweek,wewanttopick-uponpoint2b,andonthefactthatinwritinganapplicationprogramwewouldneedtowritealgorithmsthatprovideaccesstothedata,orusesomeoneelse’scodethatdoesthis.Table10providesaverysimplealgorithm,andonethatwouldpotentiallybeveryslow, toaccess theeastingandnorthingdata fromthe file<g.csv>.Table11providestheequivalentdatabasestyledeclarationdesignedtoretrievethesamedatafromadatabase.AnimportantpointtonoteisthatTable10isahighlysimplifiedsetofpseudocode,whereasTable11isthefullMySqlstatement,whichifyouranwithinadatabaseenvironmentwouldactuallywork.Wewillseelateroninthecourse,usingaprogramminglanguageknownasJavaTM,thatwithinapplicationprogramsyoucan runSql statements.Rather thanwritingapplicationcode toaccessafilebasedsystem,runningSqlstatement/queriesexposesalloftheadvantagesofdatabasestotheapplicationprogram.Programstylecode//Createafileinstanceinmemoryfromg.csvFilef=newfile(“g.csv”);//Openthefileg.csvf.open();//AccesslineG40BAforeachlineinthefile{if(lineelemetn1isequaltoG40BA){targetLine=current_line}}//GettheinformationyouwantfromthatlineEastinge=targetLine.getElemtn(3);Northingn=targetLine.getElemtn(4);

Table10:Applicationprogrampseudocodetoretrievedatafromfile.

Databasestyledeclaration//SelectrequireddatafromthedatabaseSelectEastings,NorthingsfromdatabasewherePostcodelike‘G40AB’;

Page 13: Data organisation and some basic relational ideas3 WEEK 2: INTRODUCING THE RELATIONAL APPROACH TO DATA 2.1 Designing for data 2.1.1 Relationships between data/datum During week 1,

13

WEEK2:INTRODUCINGTHERELATIONALAPPROACHTODATA

Table11:Codetoretrievedatafromfilevdatabasedeclaration.

Whatisthebasicdifferencebetweenalgorithmicprograminganddeclarativeprogramming?Algorithmicprogramming:herethecontrol(andthereforetheresponsibility)andthedesignof thealgorithmsare in thehandsof theprogrammer.Allof thestepsneeddefiningandimplementing in-code.Thiscanbeahighly rewardingprocessas theprogrammerdesignscodeforre-use,orspeed,orsometrade-offbetweenthetwo.Declarative programming: this is not ‘programming’ in the same sense. A declarative‘programming’languagestateswhichdataisneeded,fromwhichdatabase.Theenjoymentofworkingwithdatabases isoftenrelatedtothecomparativesimplicityofretrievingdatafromadatabaseascomparedwithhavingtoaccessitfromafile-basedsystem,butisalsorelatedtotheprocessofactuallydesigningthestructureofthedataitself.Eitherformofprogrammingisnosensebetterthananother.Aswehavesuggested,thetwostylesofprogrammingareabsolutelycomplementary.own creativity and resourcefulness. Youmight, not yet, be capableof implementing yourideas,but,ifyouworkhard,thenthiswillcomewithtime.2.5 Summary

• Database development should not just be undertaken by jumping straight intocreatingadatabase.Importantprocessesofdesignmusttakeplacefirst.

• We introduced the relational approach and listed some basic relational database

definitions.

• Wethenconsidered real-world file-baseddataset knownasCodePointOpen.Thereasonwedidthis,wastodevelopanappreciationforanexistingFile-basedsystem,andtheoften-usedcommaseparatedvalues(.csv)format.

• Wethenconsideredhowwemightwriteaprogramtoaccessthedatawithinsuchasystem.

• This led us to the important distinction between declarative programming and

algorithmic programming. Query languages are very much based on the idea ofdeclarativeprogramming.

~~~~~