Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
6. Lecture Introduction to Databases
Advances in databases: NoSQLdatabasesHaosheng HuangDept. of Geography, University of Zürich
Rolf MeileEidg. Forschungsanstalt für Wald, Schnee und Landschaft
Zhiyong Zhou, TutorDept. of Geography, University of Zürich
Geo874 | HS19Universität Zürich
Geo874 | Intro to Databases | HS19H. Huang, R. Meile, Uni Zürich
L6 | Advances in Databases2
Summary 5.1|
ER model Step Relational modelEntity type 1,2 Relation
Binary 1:1 relationship type 3 Add PK on the total participationside to the othe side
Binary 1:N relationship type 4Add PK on the 1-side to the N-side
Binary M:N relationship type 5 Relation and two FKsn-ary relationship type 7 Relation with n FKsSimple attribute 1, ... AttributeComposite attribute 1, ... Set of simple attributesMultivalued Attribute 6 Relation and FK
Correspondence between ER- and relational models
Ü Elmasri & Navathe (2014): Ch 9, pp. 287-296.
Geo874 | Intro to Databases | HS19H. Huang, R. Meile, Uni Zürich
L6 | Advances in Databases3
So far we covered...
Relational databases– DB Design: Requirements -> conceptual -> logical ->
physical– Entities and relationships– Relational model– SQL – the language of relational databases
| Intro
Geo874 | Intro to Databases | HS19H. Huang, R. Meile, Uni Zürich
L6 | Advances in Databases4
So far we covered...
Relational databases– DB design process: requirements -> conceptual ->
logical -> physical design– Basic assumptions:
• Requirements: Use-cases can be defined and data can be structured at start
• Conceptual: modelling entity types through fixed number of attributes, and their relationships
• Logical: relational model, dense, row-based (tuple) data structures, row-first access, and SQL as access language
• Physical: designed for a single server, ACID transactions.
So – what if some of above do not hold...
| Intro
Geo874 | Intro to Databases | HS19H. Huang, R. Meile, Uni Zürich
L6 | Advances in Databases5
Learning objectives
ü You will understand current developments in data collection, storage and processing – Why NoSQL?
ü You will understand the basic characteristics of NoSQLdatabases
ü You will be aware of different NoSQL database types: key-value, document, column, graph
| Intro
Geo874 | Intro to Databases | HS19H. Huang, R. Meile, Uni Zürich
L6 | Advances in Databases
Contents – NoSQL Databases
1. Introduction2. Characteristics3. NoSQL database types: key-value, document, column,
graph
6
Geo874 | Intro to Databases | HS19H. Huang, R. Meile, Uni Zürich
L6 | Advances in Databases
Theshifttodigitaleconomy
• EconomypoweredbytheInternetandother21st centurytechnologies– thecloud,mobile,socialmediaandbigdata
• Keytoeverydigitaleconomybusiness:Web/mobile-basedapplicationsneedto– Supportlargenumbersofconcurrentusers(tensofthousands,millions)
– Deliverhighlyresponsiveexperiencestoagloballydistributedbaseofusers– Bealwaysavailable– nodowntime– Handlesemi- andunstructureddata– Rapidlyadapttochangingrequirementswithnewfeaturesandfrequent
updates• Relationaldatabases:structuresanddatatypesarefixedinadvance• NoSQL:Applicationscanaddnewfieldsonthefly.
Relational databases are unable to meet these new requirements!
7 | Intro
Geo874 | Intro to Databases | HS19H. Huang, R. Meile, Uni Zürich
L6 | Advances in Databases
Big Data – V V V8 | Intro
Buzzword, but...– Volume: large amounts of data.
• Large: beyond what can reasonably fit into a singlememory/ hard drive
– Velocity: data arriving at a high rate and need to bestored and possibly processed fast
• E.g., streaming data from sensors, from social networkfeeds, at the stock exchange, created on the Web...
• Delays are costly– Variety/Variability: data change their schema,
formats, are inconsistent, and are of different types...• Need approaches that allow for easy adaptation in what
is stored and how it is processed– ... Some talk also of Veracity (how trustful the data can be, data quality)
Early talk on Big Data (coining the term): http://www.quora.com/Who-coined-the-term-big-data (Roger Magoulas, 2009)
Geo874 | Intro to Databases | HS19H. Huang, R. Meile, Uni Zürich
L6 | Advances in Databases
Scalability9 | Intro
– To solve Volume and Velocity we need scalable solutions: systems that can grow at least proportionally to the needs.
– Purpose-built computers are $$$ (scale up, vertical scalability)
– What if the data outgrow it?– Ability to add more small/cheap resources as
requirements grow (scale out, horizontal scalability);– Allowing partial failure by providing redundancy– Consequence: databases are partitioned between
multiple physical systems (computers).
Relational databases are poor in supporting these!
Geo874 | Intro to Databases | HS19H. Huang, R. Meile, Uni Zürich
L6 | Advances in Databases
NoSQL
• Definitionfromhttp://www.nosql-database.org/– NextGenerationDatabasesmostlyaddressingsomeofthepoints:being
non-relational,distributed,open-source andhorizontallyscalable.– “notonlySQL”– Typicallynogoodatjoins
– Triggeredbytheneedsofweb2.0companies:Facebook,Google,Amazon.com
– Increasinglyusedinbigdataandreal-timewebapplications
– Morethan225NoSQL databases• CouchDB• MongoDB• Hadoop &Hbase• Neo4J• Cassandra• …
10 | Intro
Geo874 | Intro to Databases | HS19H. Huang, R. Meile, Uni Zürich
L6 | Advances in Databases
Contents – NoSQL Databases
1. Introduction2. Characteristics3. NoSQL database types: key-value, document, column,
graph
11
Geo874 | Intro to Databases | HS19H. Huang, R. Meile, Uni Zürich
L6 | Advances in Databases
CharacteristicsofNoSQL databases
• Simplicityofdesign:schema-less– SQLdatabases:structuresanddatatypesarefixedinadvance– NoSQL databases:dynamicschemas;Applicationscanaddnewfields/attributes
onthefly;Abletostorelargevolumesofrapidlychangingstructured,semi-structured,andunstructureddata
• Horizontalscaling(distribution)toclustersofmachines– SQLdatabases:verticalscaling(add/removeresourcesto/fromacomputerserver)
– NoSQL databases:potentiallythousandsofmachines,potentiallydistributedaroundtheworld
• Bealwaysavailable,highlyresponsivequeriesforglobalusers– NoSQL databases:DBserversdistributedglobally
12 | Characteristics
Geo874 | Intro to Databases | HS19H. Huang, R. Meile, Uni Zürich
L6 | Advances in Databases
TransactionpropertiesinSQLdatabases:ACID– Atomicity
• Thesetofoperationsonthedatabaseiseitherexecutedinitsentirety,ornotatall.• Entiretransactionsucceedsorfails(all-or-nothing)• Example:transferoffundsfromaccountAtoaccountB:eitherboth,thedebitonAandthe
depositintoBareexecuted,ornoneofthetwo.
– Consistency• Afteratransactionhasbeenexecuted,theintegrityconstraintshavetobesatisfied.• ValidstateofDBbeforeANDaftertransaction• Example:Duringtheexecution,theremaybeviolations,butiftheyremainuntiltheend,the
transactionhastobeundone(“aborted”).
– Isolation• Enablestransactionstooperateindependentlyofandtransparentlytoeachother.• Othertransactionscannotaccessdatathathasbeenmodifiedduringatransactionthathas
notyetcompleted.
– Durability(persistence)• Afterthesuccessfulcompletionofatransaction,theDBMScommitstomaketheoutcomeof
thetransactionpermanent,eveninthepresenceofconcurrencyand/orbreakdowns
– These are problematic with distributed database systems (NoSQL)!• Partitioning data into different servers
| Transaction
Geo874 | Intro to Databases | HS19H. Huang, R. Meile, Uni Zürich
L6 | Advances in Databases
Partitioning consequences
• Partitioning introduces communication overhead to assure database consistency after transactions –during synchronisation or replication process.
• In physically distributed DBs network latency matters (10km = 67μs = 67x10-6s vs. cache write =10-20μs!)
• Simultaneous updates may occur• Locking – concurrency control mechanism that
assures exclusive access to a resource (value, row, table…)
• BUT! Locking impacts on availability (the resource not available for other users when updated [locked])
14 | Transactions
Geo874 | Intro to Databases | HS19H. Huang, R. Meile, Uni Zürich
L6 | Advances in Databases
Brewer’s CAP Theorem
• CAPtheorem:Adistributedcomputersystemcanprovideonlytwoofthefollowingguaranteesatthesametime.– Consistency(allnodesseethesamedataatthesametime)– Availability(aguaranteethateveryrequestreceivesaresponseabout
whetheritsucceededorfailed)– Partitiontolerance(thesystemcontinuestooperatedespitepartitioning
duetonetworkfailures)
15 | Transactions
Gilbert, S., & Lynch, N. (2002). Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services. ACM SIGACT News, 33(2), 51-59.
C
A
PX
Geo874 | Intro to Databases | HS19H. Huang, R. Meile, Uni Zürich
L6 | Advances in Databases
CAP illustration – Consistency violation 16 | Transactions
P1 P2
U1 U2
• T1 V: 1000• T2 Sync• T3 V: 1000 1000• T4 U1: V+500 = 1500• T5 U2: Read V = 1000• T6 Sync
No locking:Accessible
Can be partitionedReads can be stale
(old)
AP system:Ensuring Availability and Partition tolerance
Geo874 | Intro to Databases | HS19H. Huang, R. Meile, Uni Zürich
L6 | Advances in Databases
CAP illustration – Availability violation 17 | Transactions
P1 P2
U1 U2
• T1 V: 1000• T2 Sync• T3 V: 1000 1000• T4 U1: V+500 = 1500 Lock• T5 Sync• T6 U2: Read V = 1500
With locking:Consistent
PartitionableTemporarily inaccessible
CP system:Ensuring Consistency and Partition tolerance
Geo874 | Intro to Databases | HS19H. Huang, R. Meile, Uni Zürich
L6 | Advances in Databases
CAP illustration – Partitioning violation 18 | Transactions
P1 P2
U1 U2
• Putting in a mechanism similar to ACID transaction checks in a single server database and not accounting for failure of the communication network and its latency
Not resilient to network
failure
CA system:Ensuring Consistency and Availability
Relational DBs are CA systems
Geo874 | Intro to Databases | HS19H. Huang, R. Meile, Uni Zürich
L6 | Advances in Databases
Transaction in NoSQL• NoSQL:BASEinsteadofACID
– BasicallyAvailable:Thesystemdoesguaranteetheavailabilityofthedata;therewillbearesponsetoanyrequest.But,thatresponsecouldstillbe‘failure’toobtaintherequesteddataorthedatamaybeinaninconsistentorchangingstate.
19 | Transactions
– Softstate:Thestateofthesystemcouldchangeovertime,soevenduringtimeswithoutinputtheremaybechangesgoingondueto‘eventualconsistency,’thusthestateofthesystemisalways‘soft.’
– Eventualconsistency:Thesystemwilleventuallybecomeconsistentonceitstopsreceivinginput.Thedatawillpropagatetoeverywheresoonerorlater,butthesystemwillcontinuetoreceiveinputandisnotcheckingtheconsistencyofeverytransactionbeforeitmovesontothenextone.
Geo874 | Intro to Databases | HS19H. Huang, R. Meile, Uni Zürich
L6 | Advances in Databases
Contents – NoSQL Databases
1. Introduction2. Characteristics3. NoSQL database types: key-value, document, column,
graph
20
Geo874 | Intro to Databases | HS19H. Huang, R. Meile, Uni Zürich
L6 | Advances in Databases
NoSQL databasestypes
• Key-value• Document-based• Column-based• Graph-based
21| Database Types
Geo874 | Intro to Databases | HS19H. Huang, R. Meile, Uni Zürich
L6 | Advances in Databases
Key-valuedatabase• Key-valuedatabase:likeafilessystemwherethepathactsasthekey
andthefileactsasthevalue.
• Itpairskeystovalues– Thekeyisunique,andcanbeauto-generated.– ThevaluecanbeString,JSON,BLOB(BinaryLargeOBject),etc.
• Key-valueDBsjuststorethesevalues,withoutcaringorknowingwhat'sinside;it'stheresponsibilityoftheapplicationtounderstandwhatwasstored.
• Key-valueDBsoftenusehashtablestomapkeystovalues
• Exampledatabases:Riak,Redis,Memcached,…
Key Value“India” {“B-25, Sector-58, Noida, India – 201301”}
“Romania”{“IMPS Moara Business Center, Buftea No. 1, Cluj-Napoca, 400606”, “City Business Center, Coriolan Brediceanu No. 10, Building B, Timisoara, 300011”}
“US” {“3975 Fair Ridge Drive. Suite 200 South, Fairfax, VA 22033”}
22| Database Types
Geo874 | Intro to Databases | HS19H. Huang, R. Meile, Uni Zürich
L6 | Advances in Databases
Document-baseddatabase• Document-basedDBsstorerecordsas“documents”wherea
documentcangenerallybethoughtasacollectionofkey-valuepairs.– Format(encoding)ofdocuments:XML,JSON,BSON(BinaryJSON),…– Thedocumentsprovidesomestructureandencodingofthemanaged
data.• ComparedtoKey-valueDBs,document-basedDBsembedattributemetadata
associatedwithstoredcontent,whichessentiallyprovidesawaytoquerythedatabasedonthecontents.
• InadocumentDB,eachdocumentcarriesitsownschema— unlikearelationaldatabase,inwhicheveryrowinagiventablemusthavethesamecolumns.
• Examples:– MongoDB:usedbyLinkedIn,Foursquare,eBay,…– CouchDB– …
23| Database Types
Geo874 | Intro to Databases | HS19H. Huang, R. Meile, Uni Zürich
L6 | Advances in Databases
CouchDB JSONexample:a‘customer’document
{"_id": "1189802380023","_rev": "314159","customer": "peter", "gender": "male","likes": ["Biking", "Photography"],"address": {
"Country": ”Switzerland","City": ”Zurich"
}}
Global Unique Identifier, Passed in or generated by CouchDB
Revision number, versioning mechanism
Arbitrary tags, schema-less, could be validated by the programmers
• Eachcustomerisa“document”.– Adatabasemighthavetensofthousandsorevenmillionsof“documents”.
• Differentdocumentsdonotneedtohavethesamestructure.– e.g.,anothercustomerdocumentmighthaveanewtaglike“telephone”.– schema-less:differentfromrelationaldatabases
24| Database Types
Geo874 | Intro to Databases | HS19H. Huang, R. Meile, Uni Zürich
L6 | Advances in Databases
Column-baseddatabase• Relationaldatabasesstoreallthedatainaparticulartable’srowstogether
on-disk,makingretrievalofaparticularrowfast.
• Column-baseddatabasesgenerallystoreallthevaluesofaparticularcolumntogetheron-disk,whichmakesretrievalofalargeamountofaspecificattributefast.
• Thisapproachisverysuitableforaggregatequeriesandanalyticsscenarioswhereyoumightrunrangequeriesoveraspecificfield/attribute.
• Exampledatabase:Google’sBigTable,Hadoop’s HBase,…
Id Tweet … …1 “...”2 “…”
3 “…”
25| Database Types
Geo874 | Intro to Databases | HS19H. Huang, R. Meile, Uni Zürich
L6 | Advances in Databases
Graph-baseddatabase• Graph-baseddatabasesaregoodatdealingwithinterconnecteddata.• Graphdatabasesconsistofconnections,oredges, betweennodes. Bothnodesand
theiredgescanstoreadditionalpropertiessuchaskey-valuepairs.– Node:aninstanceofanobject
• Thestrengthofagraphdatabaseisintraversingtheconnectionsbetweenthenodes.• Graphdatabasesareverysuitedtoproblemspaceswherewehaveconnecteddata,
suchassocialnetworks,routinginformationforgoodsandmoney,recommendationengines
• Exampledatabases:Neo4J,InfiniteGraph…
26| Database Types
Geo874 | Intro to Databases | HS19H. Huang, R. Meile, Uni Zürich
L6 | Advances in Databases27
https://www.youtube.com/watch?v=jyx8iP5tfCI
Seven Databases in Song
Geo874 | Intro to Databases | HS19H. Huang, R. Meile, Uni Zürich
L6 | Advances in Databases
Summary
• NoSQL databases– schema-less: no predefined schema– horizontal scaling (distribution) to clusters of machines– ensure availability– Transaction properties: BASE instead of ACID
• SQL database: Atomicity, Consistency, Isolation, Durability• NoSQL database: Basically Available, Soft state, Eventual consistency
• Different NoSQL database types– key-value pairs, documents, graphs, ...– difficult to switch to another NoSQL database (Disadvantage)
Geo874 | Intro to Databases | HS19H. Huang, R. Meile, Uni Zürich
L6 | Advances in Databases
Recap
Geo874 | Intro to Databases | HS19H. Huang, R. Meile, Uni Zürich
L6 | Advances in Databases
Whatwecoveredinthissemester• Whyweneeddatabases• BasicsofrelationalDB• RelationalDBdesign:requirementanalysisà conceptual
DBdesignà logicalDBdesignà physicalDBdesign• StructuredQueryLanguage(SQL),PostgreSQL• NoSQL
Wehopeyouwillfindtheseusefulinfurtherstudiesandcareer.
Geo874 | Intro to Databases | HS19H. Huang, R. Meile, Uni Zürich
L6 | Advances in Databases
Later
• 9:00-9:45
• Dr. Cheng Fu (Postdoc): Intro to big data processing
Geo874 | Intro to Databases | HS19H. Huang, R. Meile, Uni Zürich
L6 | Advances in Databases
Later, in lab
• 10:15-12:00
• Y25-J-09, Y25-J-10
• Physical DB design and realization
Geo874 | Intro to Databases | HS19H. Huang, R. Meile, Uni Zürich
L6 | Advances in Databases33
Next week – exam!
• About 60 mins• Stuff covered in lectures and practicals• Not allowed to take lecture notes• Pencil and paper
• Friday 01.11.2019 8:40 am - 9:40 am in Y03-G-91
• Be here at 8:30 am!
• No lab on 01.11.2019
• Good Luck!
| Summary