DROSS Distributed & Resilient Open Source SoftwareInformix, SAP, iSCSI, DRBD, … Beyond simple 2-way hot-swap, config can get very complicated Good for stateless (e.g. HTTP); not

17-21 November 2010 ECPRD - WGICT - Bucharest 1

DROSSDistributed & Resilient Open Source Software

Andrew Hardiehttp://ashardie.com

ECPRD WGICT

17-21 November 2010

Chamber of Deputies, Bucharest

http://ashardie.com/


Topics

Distributed, not virtualized or „cloud‟ DRBD

Gluster

Heartbeat

Nginx

Trends:• NoSQL

• Map / Reduce

• Cassandra, Hadoop & family

Other stuff „out there‟

Predictions…


DRBD

Block-level disk replicator (effectively, net RAID-1)


DRBD – Good/bad points

Good for HA clusters (e,g, LAMP servers)

Ideal for block-level apps, e.g. MySQL

Sync/Async operation

Auto recovery from disk, net or node failure

In Linux kernels from 2.6.33 (Ubuntu 10.10 is 2.6.35)

Supports Infiniband, LVM, XEN, Dual primary config

Hard to extend beyond two systems, three is maximum

Remote offsite really needs DRBD Proxy (commercial)

Requires dedicated disk/partition

Moderately difficult to configure

Documentation could be better


Gluster

Filesystem-level replicator

More like NAS than RAID

Claims to scale to petabytes

Nodes can be servers, clients or both

On the fly reconfig of disks & nodes

Scripting interface

„Cloud compliant‟ (isn‟t everything?)


Gluster – Use case - Dublin

Real-time mirroring of Digital Audio


Gluster – Good/bad points

Moving to “turnkey system” (black box)

N-way replication easy

Easier than DRBD to configure

Dedicated partitions or disks not required

Supports Infiniband

Background self-healing (pull rather than push)

Aggregate and/or replicate volumes

POSIX support

Native support for NFS, CIFS, HTTP & FTP

No specific features for slow link replication

Similar documentation vs revenue earning tension


Heartbeat

HA Cluster infrastructure (“cluster glue”)

Needs Cluster Resource manager (CRM), e.g.

Pacemaker, to be useful

Part of the Linux-HA project

Provides:

hot-swap of synthetic IP address between nodes

(Synthetic IP is in addition to node‟s own IPs)

Node failure/restore detection

Start/stop of services to be managed, via init scripts


Heartbeat/DRBD – use case

HA LAMP Server pair


Heartbeat – good/bad points

Lots of resource agents available e.g. Apache, Squid, Sphinx search, VMWare, DB2,

WebSphere, Oracle, JBOSS, Tomcat, Postfix, Informix, SAP, iSCSI, DRBD, …

Beyond simple 2-way hot-swap, config can get very complicated

Good for stateless (e.g. HTTP); not so good for file shares (e.g. Samba)

Documentation out of date in some areas, e.g. Ububtu „upstart‟ scripts (boot-time startup of services to be managed by Heartbeat has to be disabled)


NGINX

Fast, simple Russian HTTP server

Reverse proxy server

Mail proxy server

Fast static content serving

Very low memory footprint

Load balancing and fault tolerance

Name and IP based virtual servers

Embedded Perl

FLV streaming

Non-threaded, event-driven architecture

Modular architecture

Can front-end Apache (instead of mod_proxy)


Trends – NoSQL, etc…

NoSQL Or, is it really NoACID (atomicity, consistency, isolation,

durability)?

It‟s really the ACID that‟s hard to scale, esp. in the very large, very active data stores (e.g. SN)

• Some NoSQLs now have SQL for query only

• Ways of solving ACID scalability being discussed

The problems:

• Huge numbers of simultaneous updates

• Large JOINs across very large tables (= big SQL query)

• Lots of updates & searches on small data elements in vast data sets

The alternative:

• Key/value stores

• De-normalized data


Consequences of De-normalizing

Order(s) of magnitude increase in storage requirements

Difficulty of updating numerous “Key equivalents” in many places – can‟t be done synchronously

Breaking relationship links allows parallel processing: helps the bottleneck of storage read speed (storage

capacity is growing much faster than transfer rates)

No JOINs or transactions


Name/Value Models

Just name/value pairs, e.g. memcachedb,

Dynamo

Name/value pairs plus associated data,

e.g. CouchDB, MongoDB – think

document stores with metadata

Name/value pairs with nesting, e.g.

Cassandra


Cassandra

Distributed, fault-tolerant database, based on

ideas in Dynamo (Amazon) & BigTable (Google)

Developed by FaceBook, open-sourced in 2008

Now Apache project

Key/value pairs, in column-oriented format

• Standard column: name, value, timestamp

• Super-column: name, map of columns, each with name,

value, timestamp (think array of hashes)

• Grouped by Column family, also either standard or super

• Column family contains „rows‟, roughly like a DB table

• Column families then go in key-spaces


Cassandra - NoACID

Cassandra, et al, e.g. Voldemort (LinkedIn), trade speed, distribution and availability for consistency and atomicity

No single point of failure

“Eventually consistent” model

Tunable levels of consistency

Atomicity only guaranteed within a column family

Accessed using Thrift (also developed by Facebook)

Used by: Facebook

Digg

Twitter

Reddit


NoSQL for Parliaments?

Much parliamentary material is naturally unstructured and suited to the name/value model (think XML)

Remember the old discussions about how to map such parliamentary material into relational databases?

Think of every MPs contribution (speech) in chamber or committee as a key/value pair, i.e. a column

Think of every PQ & answer as a super-column of name/value pairs for question, answer, holding, supplementary, pursuant, referral …

Hansard becomes a super-column family!


Map / Reduce

Column (or record) oriented design & de-normalized data power the

parallel “map reduce” model (think “sharding on speed”)


Hadoop

Nothing to do with NoSQL

Hadoop is an infrastructure and now family of tools for managing distributed systems and immense datasets

How immense? Hundreds of GB and 10 node cluster is „entry-level‟ in Hadoop terms

Developed by Yahoo for their cloud, now Apache project

Supports Map/Reduce by pre-dividing & distributing data

“Moves computation to the data instead of data to the computation”

HDFS file system particularly interesting – distributed, resilient (far more advanced than DRBD or Gluster), but not real time (more eventually consistent…)

Hive data warehouse front end – has SQL-like queries


Who uses Hadoop?

Twitter

AOL

IBM

Last.fm

LinkedIn

E-Bay

Yahoo 36,000 machines with > 100,000 cores running Hadoop

Largest cluster is only 4000 nodes

Largest known cluster is Facebook! 2000 machines with 22,400 cores

21Petabytes in a single HDFS store


Hadoop for Parliaments?

Hadoop may seem overkill for parliaments now…

But, when you start your legacy collection digitization and digital preservation projects its model, for managing large datasets which essentially do not change & don‟t need real-time commit, is very good fit!

Other interesting Hadoop projects: Zookeeper (distributed apps co-ordination)

Hive (data warehouse infrastructure)

Pig (high-level data flow language)

Mahout (scalable machine learning library)

Scribe (for aggregating streaming log data) [not strictly Hadoop project, but can be integrated with it, using interesting work-around for the non-real time & NameNode single point of failure]


Other things „out there‟

Drizzle A database “optimized for Cloud infrastructure and Web applications”

“Design for massive concurrency on modern multi-cpu architecture”

But, doesn‟t actually explain how to use it for these…

It‟s SQL and ACID

Mostly seems to be a reaction against what‟s happening at MySQL…

Has to be compiled from source – no distros available for it yet

CouchDB Distributed, fault-tolerant, schema-free document-oriented database

RESTful JSON API (i.e. Web front end)

Incremental replication with bi-directional conflict detection

Written in Erlang (highly reliable language developed by Ericsson)

Supports „map/reduce‟ like querying and indexing

Interesting model, different from most other offerings

Also now an Apache project

Still too immature for anything beyond experimentation


Also „out there‟

Voldemort Another distributed key/value storage system

Used at LinkedIn

Doesn‟t seem to have much future

Cassandra is similar, better & more widely used

MonetDB “database system for high-performance applications in data mining,

OLAP, GIS, XML Query, text and multimedia retrieval “

SQL and XQUERY front ends

Also hard to see where it‟s going…

MongoDB Tries to bridge the gap between RDBMS and map/reduce

JSON document storage (like CouchDB)

No JOINs, no transactions

Supports atomic transactions only on single documents

Interesting, but may „fall between two stools‟


Predictions

Hadoop and Cassandra are the ones to watch

There will likely be some sort of re-convergence between NoSQL and query languages of some kind – can‟t do everything with map/reduce (esp. not ad hoc queries)

SQL may be destined to become like COBOL – still around and running things but not something to use for new projects

Distributed storage models (with or without map/reduce) have good future

Datasets will only get bigger – compliance, audit, digital preservation, the shift to visuals, etc

Information management models (“strategy”) and access speed will remain key problems


Questions

“What‟s it all about?”

http://ashardie.com

Documents

DROSS Distributed & Resilient Open Source SoftwareInformix, SAP, iSCSI, DRBD, … Beyond simple 2-way hot-swap, config can get very complicated Good for stateless (e.g. HTTP); not