37
NoSQL By Zenyk Matchyshyn Staff Engineer, Lohika 1

Lviv EDGE 2 - NoSQL

  • Upload
    zenyk

  • View
    3.667

  • Download
    2

Embed Size (px)

DESCRIPTION

Presentation from Lviv EDGE #2 User Group meeting by Lohika Staff Engineer / Scalable Java Lab Lead- Zenyk Matchyshyn

Citation preview

Page 1: Lviv EDGE 2 - NoSQL

NoSQL

By Zenyk MatchyshynStaff Engineer, Lohika

1

Page 2: Lviv EDGE 2 - NoSQL

Agenda

• History

• Architecture vs Technology

• Classification

• Pros and Cons of usage

• Trends

• Q/A

2

Page 3: Lviv EDGE 2 - NoSQL

HISTORY

3

Page 4: Lviv EDGE 2 - NoSQL

4

Page 5: Lviv EDGE 2 - NoSQL

History

• NoSQL Technologies are not new

• Many ideas originate from distributed computing, grid computing and parallel computing

• Main drivers:

• Scalability

• Parallelization

• Costs

5

Page 6: Lviv EDGE 2 - NoSQL

Google• In the beginning… there was Google!

• Google shared scientific papers:

• “The Google File System”, October 2003

• “MapReduce: Simplified Data Processing on Large Clusters”, December 2004

• “Bigtable: A Distributed Storage System for Structured Data”, November 2006

• “The Chubby Lock Service for Loosely-Coupled Distributed Systems”, November 2006

6

Page 7: Lviv EDGE 2 - NoSQL

Amazon

• … and Amazon!

• “Dynamo: Amazon Highly Available key/value Store”, October 2007

7

Page 8: Lviv EDGE 2 - NoSQL

New technologies!

• Creators of Lucene wanted to create a full search solution

• Ended up with Hadoop and Hadoop Distributed File System (HDFS)

• Success helped adoption and new solutions emerged

8

Page 9: Lviv EDGE 2 - NoSQL

ARCHITECTURE VS TECHNOLOGY

9

Page 10: Lviv EDGE 2 - NoSQL

Architecture vs Technology

• SQL is not bad, it’s just different

• You can use SQL DB in NoSQL way, e.g. MySQL as a key-value database

• You can do SQL queries on Hadoop data

10

Page 11: Lviv EDGE 2 - NoSQL

Architecture

• The way you store data

• The way you query data

• Technology environment

11

Page 12: Lviv EDGE 2 - NoSQL

CLASSIFICATION

12

Page 13: Lviv EDGE 2 - NoSQL

Terms

• ACID – Atomicity, Consistency, Isolation, Durability

• CAP Theorem – Consistency, Availability, Partition tolerance

• Eventual consistency

• Hashing

• Schema

13

Page 14: Lviv EDGE 2 - NoSQL

Classification

• Column oriented stores

• Key/Value stores

• Key/Value stores with configurable consistency

• Document stores

• Graph stores

14

Page 15: Lviv EDGE 2 - NoSQL

Chart

mem-cached Key/value Column

oriented Document store

RDBMS

Depth of Functionality

Scala

bilit

y &

Perf

orm

an

ce

15

Page 16: Lviv EDGE 2 - NoSQL

Column oriented

• Based on Google Bigtable

• Column oriented is a revers of Row oriented

• Assumption is that datacenters are transcontinental and connected using standard Internet

• C and P from CAP Theorem

• Data consistent and partitioned but trouble with availability

16

Page 17: Lviv EDGE 2 - NoSQL

HBase• Spin off from Hadoop project -

http://hbase.apache.org/

• Written in Java

• A lot of interfaces – Thrift, REST, JRuby, etc.

• SQL-like access through Hive - http://hive.apache.org/

• HBase ORM – Surus - https://github.com/mushkevych/surus

• Used by Facebook, Hulu, Yahoo!, Ning, etc. 17

Page 18: Lviv EDGE 2 - NoSQL

Hypertable

• Developed by Zvents, open sourced

• Written in C++

• Running on top of distributed file system

• Used by Baidu

18

Page 19: Lviv EDGE 2 - NoSQL

Key/Value

• Key/Value Store – Oracle Berkley DB (Oracle NoSQL), Redis, Kyoto Cabinet

• Can store strings, arrays, hashes

19

Page 20: Lviv EDGE 2 - NoSQL

Oracle NoSQL

• Sign of things to come!

• http://www.oracle.com/technetwork/database/nosqldb/overview/index.html

• Written in Java

• Configurable consistency

• BerkleyDB as a backend

• No single node of failure

• Transactions

20

Page 21: Lviv EDGE 2 - NoSQL

Redis

• http://redis.io/

• Lots of bindings

• Written in C

• In-memory, with optional durability

• Also a document store

21

Page 22: Lviv EDGE 2 - NoSQL

Key/Value – eventual consistency

• K/V Availability over Consistency

• Inspired by Amazon Dynamo

• Dynamo based on assumption of high speed network links between data centers and datacenters are close to each other

• A and P from CAP Theorem

• Achieve eventual consistency through replication and verification

• Consistency is eventual 22

Page 23: Lviv EDGE 2 - NoSQL

Cassandra

• http://cassandra.apache.org/

• Multidimensional map indexed by key

• No single point of failure

• Decentralized

• Tunable consistency

• Used by Facebook, Cisco, IBM, Rackspace

23

Page 24: Lviv EDGE 2 - NoSQL

Voldemort

• http://project-voldemort.com/

• Developed by LinkedIn

• Written in Java

• Developers oriented – a lot of modules are pluggable

• Strictly key/value

24

Page 25: Lviv EDGE 2 - NoSQL

Document stores

• Document Databases

• Document oriented stores are semi structured

• Mostly JSON oriented

• Also called schema free rows

• Can query by field

25

Page 26: Lviv EDGE 2 - NoSQL

MongoDB

• http://www.mongodb.org/

• Schema-free, document-oriented

• Written in C++

• Lots of interfaces

• JSON documents

• Query language, supports indexing

• Map/Reduce

26

Page 27: Lviv EDGE 2 - NoSQL

CouchDB

• http://couchdb.apache.org/

• RESTful API

• JSON documents

• Written in Erlang

• Supports ACID

• Map/Reduce

• Eventual consistency

27

Page 28: Lviv EDGE 2 - NoSQL

Graph

• Provide ways to store graphs

• Provide traversing

• Graph oriented functionality

28

Page 29: Lviv EDGE 2 - NoSQL

Neo4j

• http://neo4j.org/

• Written in Java

• Stores and navigates graphs

• Stable and proven

• Commercial and free licenses

29

Page 30: Lviv EDGE 2 - NoSQL

PROS AND CONS OF USAGE

30

Page 31: Lviv EDGE 2 - NoSQL

Pros and Cons

• Scalability

• Transactional Integrity and Consistency

• Data Modeling

• Query Support

• Access and Interface Availability

31

Page 32: Lviv EDGE 2 - NoSQL

Typical Usage

• Large amount of data

• Read/Write balanced?

• Read Heavy

• Write Heavy

• Scan

• Geospatial

• Map/Reduce

• Social data32

Page 33: Lviv EDGE 2 - NoSQL

Is it for you?

• Technology is still developing

• Be ready to patch

• SQL is easier

• Not all startups will end up being Facebooks

• Some things can be solvable only with NoSQL

33

Page 34: Lviv EDGE 2 - NoSQL

TRENDS

34

Page 35: Lviv EDGE 2 - NoSQL

Trends

• Oracle released Oracle NoSQL!

• Adoption of Hadoop soars

• SQL like access to NoSQL stores taking form – UnSQL - http://www.unqlspec.org/display/UnQL/Home

• You can participate!

35

Page 36: Lviv EDGE 2 - NoSQL

Opportunities

• Spring Data - http://www.springsource.org/spring-data

• Cloud Foundry PaaS - http://www.cloudfoundry.com/

• ORM/Simplification

36

Page 37: Lviv EDGE 2 - NoSQL

Q/A ?37