Upload
hadoop-user-group-france
View
707
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Slides de HBase par Nicolas Liochon
Citation preview
What
• Random reads & Random writes• Scans
• Strong consistency• Durability• Latency
Usual context
• HBase : Apache Implementation of Google Big Table
• API (over simplified)– byte[] get(byte[] row) – void put(byte[] row, byte[] key, byte[] value)– byte[][] scan (byte[] rowBegin, byte[] rowEnd)
• Big Table at Google– gmail, google docs and others– 2.5 exabytes– 600m QPS
Reads & Writes
• Random read & write: OLTP apps– Per row transaction– Phoenix for the SQL fans
• Key Value store like
Scans
• Data is ordered (between rows, and inside rows by columns)• Stored contiguously on disk
• Typical use case: time series– Get a subset of the timeseries for a given time interval– Order by timeserie & time
• Access some data in a single hoop– Single machine / Single disk read
• Big data duality– Single machine latency– Parallel: througput
Technical requirements
• Durability & others
Durability
buffer in the client
buffer in the server
sync in OS cache
sync on disk
Cost / Reliability
Consistency 1/2
• Business need– Need to be implemented in the other layers if not available
• Easier increments & test & set operations– Useful for some features
Consistency 2/2
• Simplifies testing
« eventual consistency is strong consistency 99% of the time » (Ian Varley / Salesforce)
Availability
• CAP:– CP vs. AP: consistent under partition
– CAP: Under partition, you can’t be consistent w/o letting either C1 or C2 go.
– In a big data system, you’re unlikely to have all the data available in both partitions anyway
S1 S2 S3 S4 C2C1
Availability
• HBase– The data is replicated 3 times– If there is at least a replica within the partition we’re good– The partitioned nodes are excluded– Data ownership goes to the non partitioned nodes
• The data remains available & consistent
• Latency hit
Latency• HBase targets millisecond latency
• Under normal operations
• Under failure–Around 1 minute–Can be less w/ configuration efforts–And GC tuning….
Latency is about percentiles– Average != 50% percentile– There are often order of magnitudes between « average » and « 95
percentile »– Post 99% = « magical 1% ».
Normal operations
• Reads– Cache!– Google for « Nick Dimiduk ».
• Writes– Flush, on all server in memory
Failure• Partitions
– Equivalent to a lot of single node failure– Hadoop contract is as well to have all data replicated 3 times
• Single node– Data is always replicated– Detect a failure (30s)– Assign to another server (0s)– Data replay (x * 10s) : writes are available– Client retry (0s)
Scoop: it could be much less• Recovery is
– Failure detection – Data recovery
• Failure detection is GC driven– Less than 10s is difficult– But it can be delagated to the hardware
• Recovery can be minimized by keeping hot data in memory– HDFS feature since ~2.x– Then data recovery is ~1s
LatencyStonebraker
•Bohrbugs. These are repeatable DBMS errors that cause the DBMS to crash. In other words, even when multiple data base replicas are available, the same transaction issued to the replicas will cause all of them to crash. No matter what, the world stops, and high availability is an impossible goal.•Application errors. The application inadvertently updates (all copies) of the data base. The data base is now corrupted, and any sane DBA will stop the world and put the data base back into a consistent state. Again, high availability is impossible to achieve.•Human error. A human types the database equivalent of RM * and causes a global outage. There is no possibility of continuing operation.
Applied same reasoning to latency
• In current deployments, GC and bugs are now more and issue than recovery time
• Likely to be revisited in a year or so.
Conclusion?
HBase is aboutRandom accessTransactionsWith durabilityAnd good performances
Caching, GCing is improving