Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
17-21 November 2010 ECPRD - WGICT - Bucharest 1
DROSSDistributed & Resilient Open Source Software
Andrew Hardiehttp://ashardie.com
ECPRD WGICT
17-21 November 2010
Chamber of Deputies, Bucharest
17-21 November 2010 ECPRD - WGICT - Bucharest 2
Topics
Distributed, not virtualized or „cloud‟ DRBD
Gluster
Heartbeat
Nginx
Trends:• NoSQL
• Map / Reduce
• Cassandra, Hadoop & family
Other stuff „out there‟
Predictions…
17-21 November 2010 ECPRD - WGICT - Bucharest 3
DRBD
Block-level disk replicator (effectively, net RAID-1)
17-21 November 2010 ECPRD - WGICT - Bucharest 4
DRBD – Good/bad points
Good for HA clusters (e,g, LAMP servers)
Ideal for block-level apps, e.g. MySQL
Sync/Async operation
Auto recovery from disk, net or node failure
In Linux kernels from 2.6.33 (Ubuntu 10.10 is 2.6.35)
Supports Infiniband, LVM, XEN, Dual primary config
Hard to extend beyond two systems, three is maximum
Remote offsite really needs DRBD Proxy (commercial)
Requires dedicated disk/partition
Moderately difficult to configure
Documentation could be better
17-21 November 2010 ECPRD - WGICT - Bucharest 5
Gluster
Filesystem-level replicator
More like NAS than RAID
Claims to scale to petabytes
Nodes can be servers, clients or both
On the fly reconfig of disks & nodes
Scripting interface
„Cloud compliant‟ (isn‟t everything?)
17-21 November 2010 ECPRD - WGICT - Bucharest 6
Gluster – Use case - Dublin
Real-time mirroring of Digital Audio
17-21 November 2010 ECPRD - WGICT - Bucharest 7
Gluster – Good/bad points
Moving to “turnkey system” (black box)
N-way replication easy
Easier than DRBD to configure
Dedicated partitions or disks not required
Supports Infiniband
Background self-healing (pull rather than push)
Aggregate and/or replicate volumes
POSIX support
Native support for NFS, CIFS, HTTP & FTP
No specific features for slow link replication
Similar documentation vs revenue earning tension
17-21 November 2010 ECPRD - WGICT - Bucharest 8
Heartbeat
HA Cluster infrastructure (“cluster glue”)
Needs Cluster Resource manager (CRM), e.g.
Pacemaker, to be useful
Part of the Linux-HA project
Provides:
hot-swap of synthetic IP address between nodes
(Synthetic IP is in addition to node‟s own IPs)
Node failure/restore detection
Start/stop of services to be managed, via init scripts
17-21 November 2010 ECPRD - WGICT - Bucharest 9
Heartbeat/DRBD – use case
HA LAMP Server pair
17-21 November 2010 ECPRD - WGICT - Bucharest 10
Heartbeat – good/bad points
Lots of resource agents available e.g. Apache, Squid, Sphinx search, VMWare, DB2,
WebSphere, Oracle, JBOSS, Tomcat, Postfix, Informix, SAP, iSCSI, DRBD, …
Beyond simple 2-way hot-swap, config can get very complicated
Good for stateless (e.g. HTTP); not so good for file shares (e.g. Samba)
Documentation out of date in some areas, e.g. Ububtu „upstart‟ scripts (boot-time startup of services to be managed by Heartbeat has to be disabled)
17-21 November 2010 ECPRD - WGICT - Bucharest 11
NGINX
Fast, simple Russian HTTP server
Reverse proxy server
Mail proxy server
Fast static content serving
Very low memory footprint
Load balancing and fault tolerance
Name and IP based virtual servers
Embedded Perl
FLV streaming
Non-threaded, event-driven architecture
Modular architecture
Can front-end Apache (instead of mod_proxy)
17-21 November 2010 ECPRD - WGICT - Bucharest 12
Trends – NoSQL, etc…
NoSQL Or, is it really NoACID (atomicity, consistency, isolation,
durability)?
It‟s really the ACID that‟s hard to scale, esp. in the very large, very active data stores (e.g. SN)
• Some NoSQLs now have SQL for query only
• Ways of solving ACID scalability being discussed
The problems:
• Huge numbers of simultaneous updates
• Large JOINs across very large tables (= big SQL query)
• Lots of updates & searches on small data elements in vast data sets
The alternative:
• Key/value stores
• De-normalized data
17-21 November 2010 ECPRD - WGICT - Bucharest 13
Consequences of De-normalizing
Order(s) of magnitude increase in storage requirements
Difficulty of updating numerous “Key equivalents” in many places – can‟t be done synchronously
Breaking relationship links allows parallel processing: helps the bottleneck of storage read speed (storage
capacity is growing much faster than transfer rates)
No JOINs or transactions
17-21 November 2010 ECPRD - WGICT - Bucharest 14
Name/Value Models
Just name/value pairs, e.g. memcachedb,
Dynamo
Name/value pairs plus associated data,
e.g. CouchDB, MongoDB – think
document stores with metadata
Name/value pairs with nesting, e.g.
Cassandra
17-21 November 2010 ECPRD - WGICT - Bucharest 15
Cassandra
Distributed, fault-tolerant database, based on
ideas in Dynamo (Amazon) & BigTable (Google)
Developed by FaceBook, open-sourced in 2008
Now Apache project
Key/value pairs, in column-oriented format
• Standard column: name, value, timestamp
• Super-column: name, map of columns, each with name,
value, timestamp (think array of hashes)
• Grouped by Column family, also either standard or super
• Column family contains „rows‟, roughly like a DB table
• Column families then go in key-spaces
17-21 November 2010 ECPRD - WGICT - Bucharest 16
Cassandra - NoACID
Cassandra, et al, e.g. Voldemort (LinkedIn), trade speed, distribution and availability for consistency and atomicity
No single point of failure
“Eventually consistent” model
Tunable levels of consistency
Atomicity only guaranteed within a column family
Accessed using Thrift (also developed by Facebook)
Used by: Facebook
Digg
17-21 November 2010 ECPRD - WGICT - Bucharest 17
NoSQL for Parliaments?
Much parliamentary material is naturally unstructured and suited to the name/value model (think XML)
Remember the old discussions about how to map such parliamentary material into relational databases?
Think of every MPs contribution (speech) in chamber or committee as a key/value pair, i.e. a column
Think of every PQ & answer as a super-column of name/value pairs for question, answer, holding, supplementary, pursuant, referral …
Hansard becomes a super-column family!
17-21 November 2010 ECPRD - WGICT - Bucharest 18
Map / Reduce
Column (or record) oriented design & de-normalized data power the
parallel “map reduce” model (think “sharding on speed”)
17-21 November 2010 ECPRD - WGICT - Bucharest 19
Hadoop
Nothing to do with NoSQL
Hadoop is an infrastructure and now family of tools for managing distributed systems and immense datasets
How immense? Hundreds of GB and 10 node cluster is „entry-level‟ in Hadoop terms
Developed by Yahoo for their cloud, now Apache project
Supports Map/Reduce by pre-dividing & distributing data
“Moves computation to the data instead of data to the computation”
HDFS file system particularly interesting – distributed, resilient (far more advanced than DRBD or Gluster), but not real time (more eventually consistent…)
Hive data warehouse front end – has SQL-like queries
17-21 November 2010 ECPRD - WGICT - Bucharest 20
Who uses Hadoop?
AOL
IBM
Last.fm
E-Bay
Yahoo 36,000 machines with > 100,000 cores running Hadoop
Largest cluster is only 4000 nodes
Largest known cluster is Facebook! 2000 machines with 22,400 cores
21Petabytes in a single HDFS store
17-21 November 2010 ECPRD - WGICT - Bucharest 21
Hadoop for Parliaments?
Hadoop may seem overkill for parliaments now…
But, when you start your legacy collection digitization and digital preservation projects its model, for managing large datasets which essentially do not change & don‟t need real-time commit, is very good fit!
Other interesting Hadoop projects: Zookeeper (distributed apps co-ordination)
Hive (data warehouse infrastructure)
Pig (high-level data flow language)
Mahout (scalable machine learning library)
Scribe (for aggregating streaming log data) [not strictly Hadoop project, but can be integrated with it, using interesting work-around for the non-real time & NameNode single point of failure]
17-21 November 2010 ECPRD - WGICT - Bucharest 22
Other things „out there‟
Drizzle A database “optimized for Cloud infrastructure and Web applications”
“Design for massive concurrency on modern multi-cpu architecture”
But, doesn‟t actually explain how to use it for these…
It‟s SQL and ACID
Mostly seems to be a reaction against what‟s happening at MySQL…
Has to be compiled from source – no distros available for it yet
CouchDB Distributed, fault-tolerant, schema-free document-oriented database
RESTful JSON API (i.e. Web front end)
Incremental replication with bi-directional conflict detection
Written in Erlang (highly reliable language developed by Ericsson)
Supports „map/reduce‟ like querying and indexing
Interesting model, different from most other offerings
Also now an Apache project
Still too immature for anything beyond experimentation
17-21 November 2010 ECPRD - WGICT - Bucharest 23
Also „out there‟
Voldemort Another distributed key/value storage system
Used at LinkedIn
Doesn‟t seem to have much future
Cassandra is similar, better & more widely used
MonetDB “database system for high-performance applications in data mining,
OLAP, GIS, XML Query, text and multimedia retrieval “
SQL and XQUERY front ends
Also hard to see where it‟s going…
MongoDB Tries to bridge the gap between RDBMS and map/reduce
JSON document storage (like CouchDB)
No JOINs, no transactions
Supports atomic transactions only on single documents
Interesting, but may „fall between two stools‟
17-21 November 2010 ECPRD - WGICT - Bucharest 24
Predictions
Hadoop and Cassandra are the ones to watch
There will likely be some sort of re-convergence between NoSQL and query languages of some kind – can‟t do everything with map/reduce (esp. not ad hoc queries)
SQL may be destined to become like COBOL – still around and running things but not something to use for new projects
Distributed storage models (with or without map/reduce) have good future
Datasets will only get bigger – compliance, audit, digital preservation, the shift to visuals, etc
Information management models (“strategy”) and access speed will remain key problems
17-21 November 2010 ECPRD - WGICT - Bucharest 25
Questions
“What‟s it all about?”
http://ashardie.com