Scaling HBase (nosql store) to handle massive loads at Pinterest by Jeremy Carol

Pinterest Engineering

HBaseOperations on EC2Jeremy CarrollBig Data Gurus

Overview

• Deployment Strategies for EC2• Validating Design• Production Support

Powered by HBase

Lets Deploy

• First Question Asked• Rack Locality?• Cloud Concepts

High Availability

Logical Separation

Cell Based

Logical Separation

Does This Work?

• Schema Design• Hot Spots• Load Testing• Tools

Does This Work?

Compaction

OpenTSDB

Production

• Monitoring• Alerting• Health

Monitoring

Baselines

Visualization

Problems

Alerting

Baselines

17:10 <jeremy_carroll> jmhsieh: I think I found the root cuase. All my region servers reach the barrier, but it does not continue.17:11 <jeremy_carroll> jmhsieh: All RS have this in their logs:DEBUG org.apache.hadoop.hbase.procedure.Subprocedure: Subprocedure 'backup1' coordinator notified of 'acquire', waiting on 'reached' or 'abort' from coordinator.17:11 <jeremy_carroll> jmhsieh: Then the coordinator (Master) never sends anything. They just sit until the timeout.17:12 <jeremy_carroll> jmhsieh: So basically 'reached' is never obtained. Then abort it set, and it fails....17:24 <jeremy_carroll> jmhsieh: Found the bug. The hostnames dont match the master due to DNS resolution17:25 <jeremy_carroll> jmhsieh: The barrier aquired is putting in the local hostnamefrom the regionservers. In EC2 (Where reverse DNS does not work well), the master hands the internal name to the client.17:26 <jeremy_carroll> jmhsieh: So it's waiting for something like 'ip-10-155-208-202.ec2.internal,60020,1367366580066'zNode to show up, but instead 'hbasemetaclustera-d1b0a484,60020,1367366580066,' is being inserted. Barrier is not reached17:27 <jeremy_carroll> jmhsieh: Reason being in our environment the master does nothave a reverse DNS entry. So we get stuff like this on RegionServer startup in our logs.17:27 <jeremy_carroll> jmhsieh: 2013-05-01 00:03:00,614 INFO org.apache.hadoop.hbase.regionserver.HRegionServer:Master passed us hostname to use. Was=hbasemetaclustera-d1b0a484, Now=ip-10-155-208-202.ec2.internal17:54 <jeremy_carroll> jmhsieh: That was it. Verified. Now that Reverse DNS is working,snapshots are working. Now how to figure out how to get Reverse DNS working on Route53. Iwished there was something like 'slave.host.name' inside of Hadoop for this. Looking at source code.

Snapshots & DNSHBASE-8473

Thanks!

Scaling HBase (nosql store) to handle massive loads at Pinterest by Jeremy Carol

Technology

hadoop developer - SevenMentor · 2021. 2. 17. · D. HBASE: Introduction to HBASE Basic Configurations of HBASE Fundamentals of HBase What is NoSQL? HBase Data Model Table and Row

HBase or Cassandra? A Comparative study of NoSQL Database

Renormalization of NoSQL Database Schemas...application using HBase to track requests to an on-line service. To store records of requests in an HBase table, the application must decide

An overview of the Hadoop/MapReduce/HBase framework …...using the open source HBase project which offers a truly distributed, NOSQL database that is capable of supporting thousands

Exploring NoSQL, Hadoop and HBase - Université libre de ...cs.ulb.ac.be/.../teaching/infoh415/student_projects/hadoop_hbase.pdf · Exploring NoSQL, Hadoop and HBase ... BigData,

Evaluating and Enhancing Efficiency of Recommendation ... · filtering. HBase is a distributed column-oriented NoSQL database built on top of the Hadoop file system,we are using HBase

hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest

Internet of Things (Smart Grid) Storm Archival Storage – NOSQL like Hbase Streaming Processing (Iterative MapReduce) Batch Processing (Iterative MapReduce)

TR-4541: NetApp FAS and HBase · Technical Report NetApp FAS and HBase Hadoop Online NoSQL with NetApp FAS NFS Connector for Hadoop Ankita Dhawale and Karthikeyan Nagalingam, NetApp

Oracle NoSQL Database Compared to HBase€¦ · Apache project for 6 years NoSQL DB replication is easier to manage, more automatic and integrated. NoSQL is more mature based on 20+

A Simple Approach for Executing SQL on a NoSQL Datastoregsd.di.uminho.pt › members › jop › pdfs › VCPO13.pdf · HBase as the NoSQL datastore. HBase Overview: HBase is a key-value

NoSQL with Hadoop and HBase

Oracle NoSQL Database · Oracle NoSQL Database Compared to Cassandra and HBase Overview Oracle NoSQL Database is licensed under AGPL while Cassandra and HBase are Apache 2.0 licensed

NoSQL²: Store LDAP Data in HBase - Apache Directory...NoSQL² – Store LDAP Data in HBase Apache Directory Server • Directory Server in Java • Protocols: LDAP, DNS, DHCP, NTP,

M. Grigorieva, M. Golosova€¦ · Database performance tests : SQL - NoSQL, NoSQL - NoSQL Technology evaluation tests results for NoSQL databases: MongoDB, HBase, Cassandra, Dremel,

HBase and Drill: How Loosely Typed SQL is Ideal for NoSQL

NoSQL: HBase and Neo4j · • The column-oriented architecture allows for huge, wide, ... –Other types of NoSQL stores are poor for interconnected data • Cons: –Sharding: data

Accelerating NoSQL...NoSQL Map NoSQL Key-Value Store KV Stores (durable) KV Stores (volatile) Dynamo, Voldemort, Riak Memcached, Redis Column Store Cassandra, BigTable, HBase Graph

Oracle NoSQL Database Compared to HBase … · Oracle NoSQL Database Compared to HBase ... YCSB is an OLTP-style, record-based application benchmark. Oracle NoSQL …

Prassnitha Sampath - Real Time Big Data Analytics with Kafka, Storm & HBase - NoSQL matters Dublin 2015