Upload
jesse-yates
View
1.707
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Overview of the secondary indexing implementation coming soon in Phoenix (https://github.com/forcedotcom/phoenix)
Citation preview
Secondary Indexing in Phoenix
Jesse YatesHBase CommitterSoftware Engineer
HBase BoF – June 25, 2013
HBase BoF - June 20132
Outline
• Motivation• History• HBase Consistent Indexing– Index Management– Recovery Mechanism
• Conclusion
HBase BoF - June 20133
A quick note…
HBase BoF - June 20134
Outline
• Motivation• History• HBase Consistent Indexing– Index Management– Recovery Mechanism
• Conclusion
HBase BoF - June 20135
Why do we need them?
• Sorted by key– Great for accessing on that key
What if we want to access by another dimension!?Table Scan!
HBase BoF - June 20136
A short example
• Easy to search by name of food
• Hard to search on another dimension
Name Type Date Received Manufacturer Current Count
Apple Macintosh 6/23/13 Good Farm Inc. 200
Turkey Breast 6/23/13 Tasty Meat Co. 42
Chicken Drumstick 6/18/13 Pretty Ok Food 3
Jam Strawberry 6/18/10 Mash It Up Inc. 700
HBase BoF - June 20137
A short exampleName Type Date Received Manufacturer Current Count
Apple Macintosh 6/23/13 Good Farm Inc. 200
Turkey Breast 6/23/13 Tasty Meat Co. 42
Chicken Drumstick 6/18/13 Pretty Ok Food 3
Jam Strawberry 6/18/10 Mash It Up Inc. 700
Date Received Name Type Manufacturer Current Count
6/18/13 Jam Strawberry Mash It Up Inc. 700
6/18/13 Chicken Drumstick Pretty Ok Food 3
6/23/13 Apple Macintosh Good Farm Inc. 200
6/23/13 Turkey Breast Tasty Meat Co. 42
HBase BoF - June 20138
Outline
• Motivation• History• HBase Consistent Indexing– Index Management– Recovery Mechanism
• Conclusion
HBase BoF - June 20139
HBase is “Special”…
• Partitioned Keys (“HRegion”)
• Scales because regions are independent
• Built-in data recovery mechanisms
HBase BoF - June 201310
Hasn’t someone tried this?
• Omid
• Percolator
• Culvert
• Lily
• TrendMicro
• Client-coordinated
HBase BoF - June 201311
We’ve gotten better…
• NGData– HBase-SEP– HBase-Indexer
• Intel– Lucene Full Text Indexing
HBase BoF - June 201312
Still missing some things
• In-HBase index storage– Just another table in HBase
• Simple consistency guarantees– If X fails, then Y
• Minimal overhead for covered indexes– Network roundtrips
HBase BoF - June 201313
Outline
• Motivation• History• HBase Consistent Indexing– Index Management– Recovery Mechanism
• Conclusion
14
Two Major Components
• Index Management– Build index updates– Ensures index is ‘cleaned up’
• Recovery Mechanism– Ensures index updates are “ACID”
HBase BoF - June 2013
HBase BoF - June 201315
Index Management
• Lives within a RegionCoprocesorObesrver• Access to the local Hregion• Specifies the mutations to apply to the index
tables
public interface IndexBuilder {public void setup(RegionCoprocessorEnvironment env);public Map<Mutation, String> getIndexUpdate(Put put);public Map<Mutation, String> getIndexUpdate(Delete delete);
}
HBase BoF - June 201316
Index Management
Key Observation #1
“We shouldn’t need to provide stronger guarantees than HBase - that is just asking for a bad time.”
- Jon Hsieh
HBase BoF - June 201317
* Paraphrased
*
HBase BoF - June 201318
HBase ACID
• Does NOT give you:– Cross-row consistency– Cross-table consistency
• Does give you:– Durable data on success– Visibility on success without partial rows
Key Observation #2
“Secondary indexing is inherently an easier problem than full transactions… secondary index updates are idempotent.”
- Lars Hofhansl
HBase BoF - June 201319
HBase BoF - June 201320
Idempotent Index Updates
• Doesn’t need full transactions
• Replay as many times as needed
• Can tolerate a little lag– As long as we get the order right
Taking a little ACID…
HBase BoF - June 201321
HBase BoF - June 201322
HBase BoF - June 201323
Durable Indexing: Standard Write Path
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
HBase BoF - June 201324
Durable Indexing: Standard Write Path
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
HBase BoF - June 201325
Durable Indexing
RegionCoprocessor
Host
WAL
RegionCoprocessorHost
Indexer IndexBuilder
WAL Updater
Durable!
IndexerIndex Table
Index TableIndex Table
HBase BoF - June 201326
Failure Situations
• Before writing the WAL– Nothing is durable, nothing is visible
HBase BoF - June 201327
Durable Indexing
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
Indexer
Indexer
Index TableIndex TableIndex Table
HBase BoF - June 201328
Failure Situations
• Before writing the WAL– Nothing is durable, nothing is visible
✔
HBase BoF - June 201329
Failure Situations
• Before writing the WAL– Nothing is durable, nothing is visible
• After writing WAL, before index update– WAL Replay updates the index table and the
primary table
✔
HBase BoF - June 201330
Durable Indexing
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
Indexer
Indexer
Index TableIndex TableIndex Table
HBase BoF - June 201331
Failure Situations
• Before writing the WAL– Nothing is durable, nothing is visible
• After writing WAL, before index update– WAL Replay updates the index table and the
primary table
✔
✔
HBase BoF - June 201332
Failure Situations
• Before writing the WAL– Nothing is durable, nothing is visible
• After writing WAL, before index update– WAL Replay updates the index table and the
primary table• Mid-index update– WAL Replay finishes index update, primary table
update
✔
✔
HBase BoF - June 201333
Durable Indexing
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
Indexer
Indexer
Index TableIndex TableIndex Table
HBase BoF - June 201334
Failure Situations• Before writing the WAL– Nothing is durable, nothing is visible
• After writing WAL, before index update– WAL Replay updates the index table and the primary
table• Mid-index update– WAL Replay finishes index update, primary table
update
✔
✔
✔
HBase BoF - June 201335
Failure Situations
• Before writing the WAL– Nothing is durable, nothing is visible
• After writing WAL, before index update– WAL Replay updates the index table and the primary
table• Mid-index update– WAL Replay finishes index update, primary table update
• After index updates, before primary– WAL Replay restores primary state, idempotently
applies index updates
✔
✔
✔
HBase BoF - June 201336
Durable Indexing
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
Indexer
Indexer
Index TableIndex TableIndex Table
HBase BoF - June 201337
Failure Situations
• Before writing the WAL– Nothing is durable, nothing is visible
• After writing WAL, before index update– WAL Replay updates the index table and the primary
table• Mid-index update– WAL Replay finishes index update, primary table update
• After index updates, before primary– WAL Replay restores primary state, idempotently
applies index updates
✔
✔
✔
✔
HBase BoF - June 201338
Special Note: Failed Index Updates
• Index is corrupted– Index Table does not exist– Index table does not have write schema– Etc.
• Fail-fast behavior– Kill the whole server– Forces WAL Replay to enforce correctness– Modular enough to support alternative schemes
HBase BoF - June 201339
Key Points
• Custom KeyValues to enable index durability in primary table WAL
• Custom WALEdit Codec for index update with WAL Replay
• Will see index updates before primary– Only a little bit of lag and never ‘wrong’– Matches HBase consistency
• Fail-fast behavior to enforce correctness
HBase BoF - June 201340
Upcoming Work
• Performance testing
• Standard covered index managers
• Index cleanup on compaction
HBase BoF - June 201341
Outline
• Motivation• History• HBase Consistent Indexing– Index Management– Recovery Mechanism
• Conclusion
HBase BoF - June 201342
Conclusion
• Fully transparent to client
• Easy to build custom index maintenance
• Meets current HBase consistency guarantees
• Supports HBase 0.94.9+– Coming to 0.96/0.98 soon!
hbase-index
HBase BoF - June 201343
https://github.com/forcedotcom/phoenix/tree/master/contrib/hbase-index
Detailed Blog Post
HBase BoF - June 201344
http://jyates.github.io/2013/06/11/hbase-consistent-secondary-indexing.html
HBase BoF - June 201345
Bonus!
• Usable as a standalone module
• Coming to phoenix*– Built-in support
• Future: added to HBase core (?)
* https://github.com/forcedotcom/phoenix