23
71 L24: NoSQL (continued) CS3200 Database design (sp18 s2) https://course.ccs.neu.edu/cs3200sp18s2/ 4/12/2018

L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease

71

L24:NoSQL(continued)

CS3200 Databasedesign(sp18 s2)https://course.ccs.neu.edu/cs3200sp18s2/4/12/2018

Page 2: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease

72

LastClasstoday

• NoSQL(15min):GraphDBs• CourseEvaluation(15min)• Coursereview

Page 3: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease

73

• Introduction• TransactionConsistency• 4maindatamodels﹣ Key-ValueStores(e.g.,Redis)﹣ Column-FamilyStores(e.g.,Cassandra)﹣ DocumentStores(e.g.,MongoDB)﹣ GraphDatabases(e.g.,Neo4j)

• ConcludingRemarks

Outline

Page 4: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease

74

GraphDatabases• Restrictedcaseofarelationalschema:

- Nodes (+labels/properties)- Edges (+labels/properties)

• Motivatedbythepopularityofnetwork/communicationorientedapplications

• Efficientsupportforgraph-orientedqueries- Reachability,graphpatterns,pathpatterns

- OrdinaryRDBseithernotsupportorinefficientforsuchqueries• Pathoflengthkisak-wiseselfjoin;yetaveryspecialone...

• Specializedlanguagesforgraphqueries- Forexample,patternlanguageforpaths

• Plusdistributed,2-of-CAP,etc.- Dependingonthedesignchoicesofthevendor

Page 5: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease

75

ExampleDatabases

• Graphwithnodes/edgesmarkedwithlabelsandproperties(labeledpropertygraph)- Sparksee (DEX)(Java,1st release2008)- neo4j (Java,1st release2010)- InfiniteGraph (Java/C++,1st release2010)- OrientDB (Java,1st release2010)

• Triplestores:SupportW3CRDFandSPARQL,alsoviewedasgraphdatabases- MarkLogic,AllegroGraph,Blazegraph,IBMSystemG,OracleSpatial&Graph,OpenLink

Virtuoso,ontotext

Page 6: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease

76

neo4j

• Opensource,writteninJava- Firstversionreleased2010

• SupportstheCypher querylanguage(declarativegraphQL)

• Clusteringsupport- Replicationandsharding throughmaster-slavearchitectures

• Usedbyebay,Walmart,Cisco,NationalGeographic,TomTom,Lufthansa,...

Page 7: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease

77

ExamplestakenfromGraphDatabases byRobinson,Webber,andEifrem (O’Reilly)– freeeBook

Page 8: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease

78

TheGraphDataModelinCypher

• Labeledpropertygraph model

• Node- Hasasetoflabels (typicallyonelabel)- Hasasetofproperties key:value (wherevalueisofaprimitivetypeoranarrayofprimitives)

• Edge (relationship)- Directed:nodeànode- Hasaname- Hasasetof properties (likenodes)

Page 9: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease

79

Example:CypherGraphforSocialNetworks

Graphs Are EverywhereGraphs are extremely useful in understanding a wide diversity of datasets in fieldssuch as science, government, and business. The real world—unlike the forms-basedmodel behind the relational database—is rich and interrelated: uniform and rule-bound in parts, exceptional and irregular in others. Once we understand graphs, webegin to see them in all sorts of places. Gartner, for example, identifies five graphs inthe world of business—social, intent, consumption, interest, and mobile—and saysthat the ability to leverage these graphs provides a “sustainable competitive advan‐tage.”

For example, Twitter’s data is easily represented as a graph. In Figure 1-1 we see asmall network of Twitter users. Each node is labeled User, indicating its role in thenetwork. These nodes are then connected with relationships, which help furtherestablish the semantic context: namely, that Billy follows Harry, and that Harry, inturn, follows Billy. Ruth and Harry likewise follow each other, but sadly, althoughRuth follows Billy, Billy hasn’t (yet) reciprocated.

Figure 1-1. A small social graph

Of course, Twitter’s real graph is hundreds of millions of times larger than the exam‐ple in Figure 1-1, but it works on precisely the same principles. In Figure 1-2 we’veexpanded the graph to include the messages published by Ruth.

2 | Chapter 1: Introduction

labelproperty

directionname

Page 10: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease

80

(email_4)-[:TO]->(davina), (email_4)-[:TO]->(edward);

CREATE (email_5:Email {id:'5', content:'email contents'}), (davina)-[:SENT]->(email_5), (email_5)-[:TO]->(alice), (email_5)-[:BCC]->(bob), (email_5)-[:BCC]->(edward);

This leads to the more complex, and interesting, graph we see in Figure 3-10.

Figure 3-10. A graph of email interactions

Common Modeling Pitfalls | 57

AnotherExample:EmailExchange

Page 11: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease

81

QueryExample

(email_4)-[:TO]->(davina), (email_4)-[:TO]->(edward);

CREATE (email_5:Email {id:'5', content:'email contents'}), (davina)-[:SENT]->(email_5), (email_5)-[:TO]->(alice), (email_5)-[:BCC]->(bob), (email_5)-[:BCC]->(edward);

This leads to the more complex, and interesting, graph we see in Figure 3-10.

Figure 3-10. A graph of email interactions

Common Modeling Pitfalls | 57

MATCH (bob:User{username:'Bob'})-[:SENT]->(email)-[:CC]->(alias), (alias)-[:ALIAS_OF]->(bob)

RETURN email

email

Node{id:"1",content:"..."}

Page 12: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease

82

CreatingGraphData

This first modeling attempt results in a star-shaped graph with Bob at the center. Hisactions of emailing, copying, and blind-copying are represented by relationships thatextend from Bob to the nodes representing the recipients of his mail. As we see inFigure 3-8, however, the most critical element of the data, the actual email, is missing.

Figure 3-8. Missing email node leads to lost information

This graph structure is lossy, a fact that becomes evident when we pose the followingquery:

MATCH (bob:User {username:'Bob'})-[e:EMAILED]-> (charlie:User {username:'Charlie'})RETURN e

This query returns the EMAILED relationships between Bob and Charlie (there willlikely be one for each email that Bob has sent to Charlie). This tells us that emailshave been exchanged, but it tells us nothing about the emails themselves:

+----------------+| e |+----------------+| :EMAILED[1] {} |+----------------+1 row

54 | Chapter 3: Data Modeling with Graphs

CREATE (alice:User {username:'Alice'}), (bob:User {username:'Bob'}),(charlie:User {username:'Charlie'}),(davina:User {username:'Davina'}),(edward:User {username:'Edward'}),(alice)-[:ALIAS_OF]->(bob)

Page 13: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease

83

CreatingGraphData

This first modeling attempt results in a star-shaped graph with Bob at the center. Hisactions of emailing, copying, and blind-copying are represented by relationships thatextend from Bob to the nodes representing the recipients of his mail. As we see inFigure 3-8, however, the most critical element of the data, the actual email, is missing.

Figure 3-8. Missing email node leads to lost information

This graph structure is lossy, a fact that becomes evident when we pose the followingquery:

MATCH (bob:User {username:'Bob'})-[e:EMAILED]-> (charlie:User {username:'Charlie'})RETURN e

This query returns the EMAILED relationships between Bob and Charlie (there willlikely be one for each email that Bob has sent to Charlie). This tells us that emailshave been exchanged, but it tells us nothing about the emails themselves:

+----------------+| e |+----------------+| :EMAILED[1] {} |+----------------+1 row

54 | Chapter 3: Data Modeling with Graphs

MATCH (bob:User {username:'Bob'}), (charlie:User {username:'Charlie'}), (davina:User {username:'Davina'}), (edward:User {username:'Edward'})

CREATE (bob)-[:EMAILED]->(charlie), (bob)-[:CC]->(davina),(bob)-[:BCC]->(edward)

CREATE (alice:User {username:'Alice'}), (bob:User {username:'Bob'}),(charlie:User {username:'Charlie'}),(davina:User {username:'Davina'}),(edward:User {username:'Edward'}),(alice)-[:ALIAS_OF]->(bob)

Page 14: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease

84

AnotherExample

Figure 3-12. Explicitly modeling replies in high fidelity

Here we capture each matched path, binding it to the identifier p. In the RETURNclause we then calculate the length of the reply-to chain (subtracting 1 for the SENTrelationship), and return the replier’s name and the depth at which he or she replied.This query returns the following results:

+-------------------+| replier | depth |+-------------------+| "Davina" | 1 || "Bob" | 1 || "Charlie" | 2 || "Bob" | 3 |+-------------------+4 rows

We see that both Davina and Bob replied directly to Bob’s original email; that Charliereplied to one of the replies; and that Bob then replied to one of the replies to a reply.

It’s a similar pattern for a forwarded email, which can be regarded as a new email thatsimply happens to contain some of the text of the original email. As with the replycase, we model the new email explicitly. We also reference the original email from the

Common Modeling Pitfalls | 61

MATCH p = (email:Email {id:'6'})<-[:REPLY_TO*1..4]-(:Reply)<-[:SENT]-(replier)

RETURN replier.username AS replier, length(p) - 1 AS depthORDER BY depth

replier depth

Davina 1

Bob 1

Charlie 2

Bob 3

Path assignment

Page 15: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease

85

AnotherExample

Figure 3-12. Explicitly modeling replies in high fidelity

Here we capture each matched path, binding it to the identifier p. In the RETURNclause we then calculate the length of the reply-to chain (subtracting 1 for the SENTrelationship), and return the replier’s name and the depth at which he or she replied.This query returns the following results:

+-------------------+| replier | depth |+-------------------+| "Davina" | 1 || "Bob" | 1 || "Charlie" | 2 || "Bob" | 3 |+-------------------+4 rows

We see that both Davina and Bob replied directly to Bob’s original email; that Charliereplied to one of the replies; and that Bob then replied to one of the replies to a reply.

It’s a similar pattern for a forwarded email, which can be regarded as a new email thatsimply happens to contain some of the text of the original email. As with the replycase, we model the new email explicitly. We also reference the original email from the

Common Modeling Pitfalls | 61

MATCH p = (email:Email {id:'6'})<-[:REPLY_TO*1..4]-(:Reply)<-[:SENT]-(replier)

RETURN replier.username AS replier, length(p) - 1 AS depthORDER BY depth

replier depth

Davina 1

Bob 1

Charlie 2

Bob 3

Path assignment

Page 16: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease

86

AnotherExample

Figure 3-12. Explicitly modeling replies in high fidelity

Here we capture each matched path, binding it to the identifier p. In the RETURNclause we then calculate the length of the reply-to chain (subtracting 1 for the SENTrelationship), and return the replier’s name and the depth at which he or she replied.This query returns the following results:

+-------------------+| replier | depth |+-------------------+| "Davina" | 1 || "Bob" | 1 || "Charlie" | 2 || "Bob" | 3 |+-------------------+4 rows

We see that both Davina and Bob replied directly to Bob’s original email; that Charliereplied to one of the replies; and that Bob then replied to one of the replies to a reply.

It’s a similar pattern for a forwarded email, which can be regarded as a new email thatsimply happens to contain some of the text of the original email. As with the replycase, we model the new email explicitly. We also reference the original email from the

Common Modeling Pitfalls | 61

MATCH p = (email:Email {id:'6'})<-[:REPLY_TO*1..4]-(:Reply)<-[:SENT]-(replier)

RETURN replier.username AS replier, length(p) - 1 AS depthORDER BY depth

replier depth

Davina 1

Bob 1

Charlie 2

Bob 3

Path assignment

Page 17: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease

87

Whentouseit

• Useit:- Connecteddata,e.g.socialgraphs,employeeswheretheyworked- Location-basedservices- Recommendationengines

• Don'tuseit:- Changepropertiesonmanyentities

Page 18: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease

88

• Introduction• TransactionConsistency• 4maindatamodels﹣ Key-ValueStores(e.g.,Redis)﹣ Column-FamilyStores(e.g.,Cassandra)﹣ DocumentStores(e.g.,MongoDB)﹣ GraphDatabases(e.g.,Neo4j)

• ConcludingRemarks

Outline

Page 19: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease

89

StrategyCanvas:ExampleNintendoWii(1/3)

¤ INSEAD Blue Ocean Strategy Institute 2013

Nintendo Wii Strategy Canvas

Price

High resolutiongraphics

Non-gamingfunctionalities

HDTV capabilities

Processingpower

Onlinegaming

Design aesthetics

Available gametitles

Motion

Family friendlygames

High

Low

Off

erin

g L

evel

Source:INSEAD,BlueOceanStrategyInstitute,2013.

Page 20: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease

90

StrategyCanvas:ExampleNintendoWii(2/3)

¤ INSEAD Blue Ocean Strategy Institute 2013

Nintendo Wii

Nintendo Wii Strategy Canvas

Price

High resolutiongraphics

Non-gamingfunctionalities

HDTV capabilities

Processingpower

Onlinegaming

Design aesthetics

Available gametitles

Motion

Family friendlygames

High

Low

Off

erin

g L

evel

Video Game Console Industry

Source:INSEAD,BlueOceanStrategyInstitute,2013.

Page 21: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease

91

StrategyCanvas:ExampleNintendoWii(3/3)

¤ INSEAD Blue Ocean Strategy Institute 2013

Nintendo Wii

Create

Nintendo Wii Strategy Canvas

Price

High resolutiongraphics

Non-gamingfunctionalities

HDTV capabilities

Processingpower

Onlinegaming

Design aesthetics

Available gametitles

Motion

Family friendlygames

High

Low

Off

erin

g L

evel

Video Game Console Industry

Eliminate Reduce Raise

Source:INSEAD,BlueOceanStrategyInstitute,2013.

Page 22: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease

92

RedefinetheMarket

Page 23: L24: NoSQL (continued) - course.ccs.neu.edu · 75 Example Databases • Graph with nodes/edges marked with labels and properties (labeled property graph)-Sparksee(DEX) (Java, 1strelease

93

ConcludingRemarksonCommonNoSQL

• Aimtoavoidjoin&ACIDoverhead- Joinedwithin,correctnesscompromisedforquickanswers;believeinbesteffort

• Avoidtheideaofaschema• Querylanguagesaremoreimperative- Andlessdeclarative- Developerbetterknowswhat’sgoingon;lessrelianceonsmartoptimizationplans

- Moreresponsibilityondevelopers• Nostandardwellstudiedlanguages(yet)