HBase Snapshots HBase User Group Meetup 10/29/12 Jesse Yates
So you wanna…. • Prevent data loss • Recover to a point in time • Backup your data • Sandbox copy of data
a BIG Problem… • Petabytes of data • 100’s of servers • At a single point in time • Mil ions of writes per-second
Built-in • Export – MapReduce job against HBase API – Output to single seqeunce file • Copy Table – MapReduce job against HBase API – Output to another table Yay • Simple • Heavily tested • Can do point-in-time Boo • Slow • High impact for running cluster
(Less Obvious) Solution!
Replication • Export all changes by tailing WAL YAY • Simple • Gets all edits • Minimal impact on running cluster Boo • Turn on from beginning • Can’t turn it off and catch up • No built-in point-in-time • Still need ETL process to get multiple copies
(Facebook) Solution!1 Mozil a did something similar2 1. issues.apache.org/jira/browse/HBASE-5509 2. github.com/mozil a-metrics/akela/blob/master/src/main/java/com/mozil a/hadoop/Backup.java
Facebook Backup • Copy existing hfiles, hlogs Yay • Through HDFS – Doesn’t impact running cluster • Fast – distcp is 100% faster than M/R through HBase Boo • Not widely used • Requires Hardlinks • Recovery requires WAL replay • Point-in-time needs filter
Backup through the ages Export Copy Table Replication HBase HBASE-50 HDFS Facebook
Maybe this is harder than we thought…
We did some work…
Hardlink workarounds • HBASE-5547 – Move deleted hfiles to .archive directory • HBASE-6610 – FileLink: equivalent to Windows link files Enough to get started….
Difficulties • Coordinating many servers • Minimizing unavailability • Minimize time to restore • Gotta’ be Fast
Snapshots Co • Fast ming - zero-copy of files in H • Point-in-time semantics Base- – Part of how its built 0.96 • Built-in recovery ! – Make a table from a snapshot • SLA enforcement – Guaranteed max unavailability
We’ve got a couple of those…
Snapshot Types • Offline – Table is already disabled • Global y consistent – Consistent across all servers • Timestamp consistent – Point-in-time according to each server
Offline Snapshots • Table is already disabled • Requires minimal log replay – Especial y if table is cleanly disabled • State of the table when disabled • Don’t need to worry about changing state YAY • Fast! • Simple!
But I can’t take my table offline!
Globally Consistent Snapshots • All regions block writes until everyone agrees to snapshot – Two-phase commit-ish • Time-bound to prevent infinite blocking – Unavailability SLA maintained per region • No Flushing – its fast!
What could possibly go wrong?
Cross-Server Consistency Problems • General distributed coordination problems – Block writes while waiting for all regions – Limited by slowest region – servers = P(failure) • Stronger guarantees than currently in HBase • Requires WAL replay to restore table
I don’t need all that, what else do you have?
Timestamp Consistent Snapshots • All writes up to a TS are in the snapshot • Leverages existing flush functionality • Doesn’t block writes • No WAL replay on recovery
Put/Get/Delete/Mutate/etc. MemStor t e or Timestamp in snapshot? Yes No Snap a s p hot o tStor t e or Futu t r u e r Sto t r o e r
I’ve got a snapshot, now what?
Recovery • Export snapshot – Send snapshot to another cluster • Clone snapshot – Create new table from snapshot • Restore table – Rollback table to specific state
Export Snapshot • Copy a full snapshot to another cluster – All required HFiles/Hlogs – Lots of options • Fancy dist-cp – Fast! – Minimal impact on running cluster
Clone Table • New table from snapshot • Create multiple tables from same snapshot • Exact replica at the point-in-time • Full Read/Write on new table
Restore • Replace existing table with snapshot • Snapshots current table, just in case • Minimal overhead – Handles creating/deleting regions – Fixes META for you
Whew, that’s a lot!
Even more awesome!
Goodies • Full support in shell • Distributed Coordination Framework • ‘Ragged Backup’ added along the way • Coming in next CDH • Backport to 0.94?
Special thanks! • Matteo Bertozzi – All the recovery code – Shell support • Jon Hsieh – Distributed Two-Phase Commit refactor • All our reviewers… – Stack, Ted Yu, Jon Hsieh, Matteo