49
NoSQL: Graph Databases

NoSQL: Graph Databases. Databases Why NoSQL Databases?

Embed Size (px)

DESCRIPTION

Trends in Data

Citation preview

Page 1: NoSQL: Graph Databases. Databases Why NoSQL Databases?

NoSQL: Graph Databases

Page 2: NoSQL: Graph Databases. Databases Why NoSQL Databases?

Databases

Why NoSQL Databases?

Page 3: NoSQL: Graph Databases. Databases Why NoSQL Databases?

Trends in Data

Page 4: NoSQL: Graph Databases. Databases Why NoSQL Databases?

Data is getting bigger:“Every 2 days we create as much information as we did up to 2003”

– Eric Schmidt, Google

Page 5: NoSQL: Graph Databases. Databases Why NoSQL Databases?

Data is more connected:

• Text • HyperText • RSS • Blogs • Tagging • RDF

Page 6: NoSQL: Graph Databases. Databases Why NoSQL Databases?

Trend 2: ConnectednessIn

form

ation

con

necti

vity

Text Documents

Hypertext

Feeds

Blogs

Wikis

UGC

TaggingFolksonomies

RDFa

Onotologies

GGG

Page 7: NoSQL: Graph Databases. Databases Why NoSQL Databases?

Data is more Semi-Structured:

• If you tried to collect all the data of every movie ever made, how would you model it?

• Actors, Characters, Locations, Dates, Costs, Ratings, Showings, Ticket Sales, etc.

Page 8: NoSQL: Graph Databases. Databases Why NoSQL Databases?

Architecture Changes Over Time

DB

Application

1980’s: Single Application

Page 9: NoSQL: Graph Databases. Databases Why NoSQL Databases?

Architecture Changes Over Time

DB

Application

1990’s: Integration Database Antipattern

ApplicationApplication

Page 10: NoSQL: Graph Databases. Databases Why NoSQL Databases?

Architecture Changes Over Time

2000’s: SOA

DB

Application

DB

Application

DB

Application

RESTful, hypermedia, composite apps

Page 11: NoSQL: Graph Databases. Databases Why NoSQL Databases?

Side note: RDBMS performanceSalary list

Most Web apps

Social Network

Location-based services

Page 12: NoSQL: Graph Databases. Databases Why NoSQL Databases?

NOSQLNot Only SQL

Page 13: NoSQL: Graph Databases. Databases Why NoSQL Databases?

Less than 10% of the NOSQL Vendors

Page 14: NoSQL: Graph Databases. Databases Why NoSQL Databases?

Four NOSQL Categories

Page 15: NoSQL: Graph Databases. Databases Why NoSQL Databases?

Key Value Stores

• Most Based on Dynamo: Amazon Highly Available Key-Value Store

• Data Model: – Global key-value mapping– Big scalable HashMap– Highly fault tolerant (typically)

• Examples:– Redis, Riak, Voldemort

Page 16: NoSQL: Graph Databases. Databases Why NoSQL Databases?

Key Value Stores: Pros and Cons

• Pros:– Simple data model– Scalable

• Cons– Create your own “foreign keys”– Poor for complex data

Page 17: NoSQL: Graph Databases. Databases Why NoSQL Databases?

Column Family

• Most Based on BigTable: Google’s Distributed Storage System for Structured Data

• Data Model: – A big table, with column families– Map Reduce for querying/processing

• Examples:– HBase, HyperTable, Cassandra

Page 18: NoSQL: Graph Databases. Databases Why NoSQL Databases?

Column Family: Pros and Cons

• Pros:– Supports Simi-Structured Data– Naturally Indexed (columns)– Scalable

• Cons– Poor for interconnected data

Page 19: NoSQL: Graph Databases. Databases Why NoSQL Databases?

Document Databases

• Data Model: – A collection of documents– A document is a key value collection– Index-centric, lots of map-reduce

• Examples:– CouchDB, MongoDB

Page 20: NoSQL: Graph Databases. Databases Why NoSQL Databases?

Document Databases: Pros and Cons

• Pros:– Simple, powerful data model– Scalable

• Cons– Poor for interconnected data– Query model limited to keys and indexes– Map reduce for larger queries

Page 21: NoSQL: Graph Databases. Databases Why NoSQL Databases?

Graph Databases

• Data Model: – Nodes and Relationships

• Examples:– Neo4j, OrientDB, InfiniteGraph, AllegroGraph

Page 22: NoSQL: Graph Databases. Databases Why NoSQL Databases?

Graph Databases: Pros and Cons

• Pros:– Powerful data model, as general as RDBMS– Connected data locally indexed– Easy to query

• Cons– Sharding ( lots of people working on this)• Scales UP reasonably well

– Requires rewiring your brain

Page 23: NoSQL: Graph Databases. Databases Why NoSQL Databases?

What are graphs good for?• Recommendations• Business intelligence• Social computing• Geospatial• Systems management• Web of things• Genealogy• Time series data• Product catalogue• Web analytics• Scientific computing (especially bioinformatics)• Indexing your slow RDBMS• And much more!

Page 24: NoSQL: Graph Databases. Databases Why NoSQL Databases?

What is a Graph?

Page 25: NoSQL: Graph Databases. Databases Why NoSQL Databases?

What is a Graph?

• An abstract representation of a set of objects where some pairs are connected by links.

Object (Vertex, Node)

Link (Edge, Arc, Relationship)

Page 26: NoSQL: Graph Databases. Databases Why NoSQL Databases?

Different Kinds of Graphs

• Undirected Graph• Directed Graph

• Pseudo Graph• Multi Graph

• Hyper Graph

Page 27: NoSQL: Graph Databases. Databases Why NoSQL Databases?

More Kinds of Graphs

• Weighted Graph

• Labeled Graph

• Property Graph

Page 28: NoSQL: Graph Databases. Databases Why NoSQL Databases?

What is a Graph Database?

• A database with an explicit graph structure• Each node knows its adjacent nodes • As the number of nodes increases, the cost of

a local step (or hop) remains the same• Plus an Index for lookups

Page 29: NoSQL: Graph Databases. Databases Why NoSQL Databases?

Relational Databases

Page 30: NoSQL: Graph Databases. Databases Why NoSQL Databases?

Graph Databases

Page 31: NoSQL: Graph Databases. Databases Why NoSQL Databases?
Page 32: NoSQL: Graph Databases. Databases Why NoSQL Databases?

Neo4j Tips

• Each entity table is represented by a label on nodes

• Each row in a entity table is a node• Columns on those tables become node

properties.• Remove technical primary keys, keep business

primary keys• Add unique constraints for business primary keys,

add indexes for frequent lookup attributes

Page 33: NoSQL: Graph Databases. Databases Why NoSQL Databases?

Neo4j Tips

• Replace foreign keys with relationships to the other table, remove them afterwards

• Remove data with default values, no need to store those• Data in tables that is denormalized and duplicated might

have to be pulled out into separate nodes to get a cleaner model.

• Indexed column names, might indicate an array property (like email1, email2, email3)

• Join tables are transformed into relationships, columns on those tables become relationship properties

Page 34: NoSQL: Graph Databases. Databases Why NoSQL Databases?

Node in Neo4j

Page 35: NoSQL: Graph Databases. Databases Why NoSQL Databases?

Relationships in Neo4j

• Relationships between nodes are a key part of Neo4j.

Page 36: NoSQL: Graph Databases. Databases Why NoSQL Databases?

Relationships in Neo4j

Page 37: NoSQL: Graph Databases. Databases Why NoSQL Databases?

Twitter and relationships

Page 38: NoSQL: Graph Databases. Databases Why NoSQL Databases?

Properties

• Both nodes and relationships can have properties.

• Properties are key-value pairs where the key is a string.

• Property values can be either a primitive or anarray of one primitive type.

For example String, int and int[] values are valid for properties.

Page 39: NoSQL: Graph Databases. Databases Why NoSQL Databases?

Properties

Page 40: NoSQL: Graph Databases. Databases Why NoSQL Databases?

Paths in Neo4j

• A path is one or more nodes with connecting relationships, typically retrieved as a query or traversal result.

Page 41: NoSQL: Graph Databases. Databases Why NoSQL Databases?

Traversals in Neo4j

• Traversing a graph means visiting its nodes, following relationships according to some rules.

• In most cases only a subgraph is visited, as you

already know where in the graph the interesting nodes and relationships are found.

• Traversal API

• Depth first and Breadth first.

Page 42: NoSQL: Graph Databases. Databases Why NoSQL Databases?

Starting and Stopping

Page 43: NoSQL: Graph Databases. Databases Why NoSQL Databases?

Preparing the database

Page 44: NoSQL: Graph Databases. Databases Why NoSQL Databases?

Wrap mutating operations in a transaction.

Page 45: NoSQL: Graph Databases. Databases Why NoSQL Databases?

Creating a small graph

Page 46: NoSQL: Graph Databases. Databases Why NoSQL Databases?

Print the data

Page 47: NoSQL: Graph Databases. Databases Why NoSQL Databases?

Remove the data

Page 48: NoSQL: Graph Databases. Databases Why NoSQL Databases?

The Matrix Graph Database

Page 49: NoSQL: Graph Databases. Databases Why NoSQL Databases?

Traversing the Graph