44
© 2008 Palantir Technologies Inc. All rights reserved. Palantir Revisioning Database Bob McGrew Director of Engineering

Palantir's Revisioning Database

Embed Size (px)

Citation preview

Page 1: Palantir's Revisioning Database

© 2008 Palantir Technologies Inc. All rights reserved.

Palantir Revisioning DatabaseBob McGrew

Director of Engineering

Page 2: Palantir's Revisioning Database

Introduction

Imagine a community of thousands of analysts all reading and writing to a traditional database.

Problems for analysts:– No history of additions and edits for data– No ability to discover “what we knew when”– No prior review of shared data– No granular sharing

Page 3: Palantir's Revisioning Database

Revisioning Database features

Online object history Separate spaces for analysis Granular sharing of changes

Page 4: Palantir's Revisioning Database

Revisioning Database features

Online object history Separate spaces for analysis Granular sharing of changes

Page 5: Palantir's Revisioning Database

Alternative approach: audit logs

Write every change to audit log– Separate data store (database or flat file)– Separate tools (SQL or grep)

Not integrated with analysis– Offline and inaccessible by design– No security info– No analyst-facing tools– Infeasible to determine “what we knew when”

Audit logging is important, but no substitute for an online history

Page 6: Palantir's Revisioning Database

Integrated object history

Online history of every change

Ability to view “what we knew when”

Page 7: Palantir's Revisioning Database

Modeling data: The object model

Each stored in separate database table

Page 8: Palantir's Revisioning Database

Modeling time: Application events

Model time as a linear sequence of application events Provides atomicity for changes The database ID for an app event always increases with time

– Total ordering for all app events– Wall clock time allows ties

Page 9: Palantir's Revisioning Database

Revisioning Database: the basic idea

Every edit to an object component inserts a new row into the relevant database table

Once inserted, rows are never updated

Page 10: Palantir's Revisioning Database

Down to the database

Revisioning fields:– id: long-living component ID– object_id: ties the component to its object– app_event_id: totally ordered timestamp for row– version: primary key for row, based on ID and app_event_id– deleted: 0 if not deleted; non-zero if deleted

Metadata fields:– time_created: time component was created– created_by: user ID of component creator– last_modified: time row was created

Value fields:– <value fields>: everything else

Page 11: Palantir's Revisioning Database

Modifying an object

op id object_id app_event_id version deleted value

Create 10 10 1 1001 0 Type:Person

Create 101 10 1 1002 0 Name:Mike Fikri

Create 102 10 2 1003 0 Phone#:650-494-1574

Edit 101 10 3 1004 0 Name:Michael Fikri

Delete 102 10 4 1005 1 Phone#:650-494-1574

Page 12: Palantir's Revisioning Database

Loading an object

App Event # Object #1 Property #101 Property #1021 Type: Person Name: Mike Fikri2 Phone#: 650-494-15743 Name: Michael Fikri4 Phone#: 650-494-1574

op id object_id app_event_id version deleted value

Create 10 10 1 1001 0 Type:Person

Create 101 10 1 1002 0 Name:Mike Fikri

Create 102 10 2 1003 0 Phone#:650-494-1574

Edit 101 10 3 1004 0 Name:Michael Fikri

Delete 102 10 4 1005 1 Phone#:650-494-1574

Page 13: Palantir's Revisioning Database

Loading an object

App Event # Object #10 Property #101 Property #1021 Type: Person Name: Mike Fikri2 Phone#: 650-494-15743 Name: Michael Fikri4 Phone#: 650-494-15745

• Choose all rows that • Belong to object 10

Page 14: Palantir's Revisioning Database

Loading an object

App Event # Object #10 Property #101 Property #1021 Type: Person Name: Mike Fikri2 Phone#: 650-494-15743 Name: Michael Fikri4 Phone#: 650-494-15745

Type: Person Name: Michael Fikri Phone#: 650-494-1574

• Choose all rows that • Belong to object 10• Have not been superseded by a later version

Page 15: Palantir's Revisioning Database

Loading an object

App Event # Object #10 Property #101 Property #1021 Type: Person Name: Mike Fikri2 Phone#: 650-494-15743 Name: Michael Fikri4 Phone#: 650-494-15745

Type: Person Name: Michael Fikri

• Choose all rows that • Belong to object 10• Have not been superseded by a later version• Are not deleted

Page 16: Palantir's Revisioning Database

Loading an object from history

App Event # Object #10 Property #101 Property #1021 Type: Person Name: Mike Fikri2 Phone#: 650-494-1574

3 Name: Michael Fikri4 Phone#: 650-494-1574

• To load the version of Mike at app event 2, choose all rows that

• Belong to object 10• Have not been superseded by a later version less than 2• Are not deleted

Page 17: Palantir's Revisioning Database

Loading an object from history

App Event # Object #10 Property #101 Property #1021 Type: Person Name: Mike Fikri2 Phone#: 650-494-1574

Type: Person Name: Mike Fikri Phone#: 650-494-15743 Name: Michael Fikri4 Phone#: 650-494-1574

• To load the version of Mike at app event 2, choose all rows that

• Belong to object 10• Have not been superseded by a later version less than 2• Are not deleted

Page 18: Palantir's Revisioning Database

Loading an object from history

App Event # Object #10 Property #101 Property #1021 Type: Person Name: Mike Fikri2 Phone#: 650-494-1574

Type: Person Name: Mike Fikri Phone#: 650-494-15743 Name: Michael Fikri4 Phone#: 650-494-1574

• To load the version of Mike at app event 2, choose all rows that

• Belong to object 10• Have not been superseded by a later version less than 2• Are not deleted

Page 19: Palantir's Revisioning Database

Speed of operations

Time efficient– Edits– Fetch current version of object– Fetch past version of object– Fetch all changes in a given time range

Space efficient– No overhead if object never changed– Only one row overhead per change

Page 20: Palantir's Revisioning Database

Revisioning Database features

Online object history Separate spaces for analysis Granular sharing of changes

Page 21: Palantir's Revisioning Database

Separate spaces for analysis

Explore competing hypotheses Edit information without affecting others Control over visibility of others’ edits Solution:

– Let analysts “check out” a copy of the data– Similar to source control systems like SVN or CVS

Page 22: Palantir's Revisioning Database

A realm of one’s own

A realm is a complete view of all objects– Each investigation– Data Repository

Investigations inherit data from the Repository– Load and view Repository data– Search Repository data

Page 23: Palantir's Revisioning Database

Down to the database

Revisioning fields:– id: long-living component ID– object_id: ties the component to its object– realm_id: the realm that this row is in– app_event_id: totally ordered timestamp– version: primary key, based on ID and app_event_id– deleted: 0 if not deleted; non-zero if deleted

Page 24: Palantir's Revisioning Database

Down to the database

Revisioning fields:– id: long-living component ID– object_id: ties the component to its object– realm_id: the realm that this row is in– app_event_id: totally ordered timestamp– version: primary key, based on ID and app_event_id– deleted: 0 if not deleted; non-zero if deleted

Page 25: Palantir's Revisioning Database

Making inheritance work

Copy-on-read approach– Copy on first viewing– Edit as normal– Expensive for large objects or many objects

Copy-on-write– Write a pointer to the right object state– Edits supersede pointed-to rows– Super-space efficient; no redundant info

Page 26: Palantir's Revisioning Database

The realm_object relation

Pointer is relation between objects and realm realm_object row:

– object_id: the object that we are locking– realm_id: the investigative realm– source_realm_id: the Data Repository – source_realm_app_event_id: the app event that the object is

locked into Realm_objects are object components Realm_objects are revisioned!

Page 27: Palantir's Revisioning Database

Object modification

Object addition to realm– Write one realm_object row– source_realm_app_event is the current app event

Edit– Write only changed rows to investigative realm– Need not write full object– Data Repository untouched

Page 28: Palantir's Revisioning Database

Loading an object

If there is no realm_object row,– Load the object from the Data Repository at the latest app event

If there is a realm_object row,– Load the object from the Data Repository at the app event

specified– Load all rows from the investigative realm at the current app

event– Investigative rows supersede Data Repository rows

Page 29: Palantir's Revisioning Database

Loading with realm objects

App Event # Object #10 Property #101 Property #1021 Type: Entity Name: Mike Fikri2 Phone#: 650-494-15743 Name: Michael Fikri4 Phone#: 650-494-15745

Data Repo Type: Entity Name: Michael Fikri

App Event # Object #10 Property #101 Property #1026 Type: Person

Investigation Type: Person

Page 30: Palantir's Revisioning Database

Loading with realm objects

Realm Object #10 Property #101 Property #102Data Repo Type: Entity Name: Michael Fikri

Investigation Type: Person

Type: Person Name: Michael Fikri

Page 31: Palantir's Revisioning Database

Speed of operations

Time efficient– Edits– Fetch current version of object– Fetch past version of object– Fetch all changes in a given time range

Space efficient– Almost all data shared between realms– One row written per edit

Page 32: Palantir's Revisioning Database

Revisioning Database Features

Online object history Separate spaces for analysis Granular sharing of changes

Page 33: Palantir's Revisioning Database

Granular data sharing

Changes developed in one realm need to be shared with other analysts

Data sharing must be per-object Update

– Bringing changes from the Data Repository into an investigation Publish

– Moving changes from an investigation into the Data Repository

Page 34: Palantir's Revisioning Database

Update

Edit the realm_object Insert new realm_object row:

– object_id: unchanged– realm_id: unchanged– source_realm_id: unchanged– source_realm_app_event_id: overwrite to the new app event

that the object is locked into– Set other revisioning fields appropriately

Page 35: Palantir's Revisioning Database

Publish

Two-phase process Data Repository

– Copy rows from investigative realm– Only need to copy rows that are still current

Investigative realm– Need to allow Data Repository rows to be authoritative– Insert rows with deleted = -1– Rows with deleted = -1 do not supersede Data Repository rows

Page 36: Palantir's Revisioning Database

Publish

id object_id realm_id app_event_id deleted value

10 10 100 1 0 Type:Person

101 10 100 1 0 Name:Mike Fikri

102 10 100 2 0 Phone#:650-494-1574

101 10 100 3 0 Name:Michael Fikri

102 10 100 4 1 Phone#:650-494-1574

Page 37: Palantir's Revisioning Database

Publish: Copy rows to Repository

id object_id realm_id app_event_id deleted value

10 10 100 1 0 Type:Person

101 10 100 1 0 Name:Mike Fikri

102 10 100 2 0 Phone#:650-494-1574

101 10 100 3 0 Name:Michael Fikri

102 10 100 4 1 Phone#:650-494-1574

id object_id realm_id app_event_id deleted value

10 10 1 5 0 Type:Person

101 10 1 5 0 Name:Michael Fikri

Page 38: Palantir's Revisioning Database

Publish: Delete rows in investigation

id object_id realm_id app_event_id deleted value

10 10 100 1 0 Type:Person

101 10 100 1 0 Name:Mike Fikri

102 10 100 2 0 Phone#:650-494-1574

101 10 100 3 0 Name:Michael Fikri

102 10 100 4 1 Phone#:650-494-1574

10 10 100 6 -1 Type:Person

101 10 100 6 -1 Name:Michael Fikri

id object_id realm_id app_event_id deleted value

10 10 1 5 0 Type:Person

101 10 1 5 0 Name:Michael Fikri

Page 39: Palantir's Revisioning Database

Loading after publish

App Event # Object #10 Property #101 Property #1021 Type: Person Name: Mike Fikri2 Phone#: 650-494-15743 Name: Michael Fikri4 Phone#: 650-494-15746 Type: Person Name: Michael Fikri

Investigation

App Event # Object #10 Property #101 Property #1025 Type: Person Name: Michael Fikri

Data Repo Type: Person Name: Michael Fikri

Page 40: Palantir's Revisioning Database

Loading after publish

Realm Object #10 Property #101 Property #102Data Repo Type: Person Name: Michael Fikri

Investigation

Type: Person Name: Michael Fikri

Page 41: Palantir's Revisioning Database

Speed of operations

Time efficient– Updates– Publish is bulk in-database insert

Space efficient– Updates– Publish writes no more rows than the original– Often considerably fewer

Page 42: Palantir's Revisioning Database

Revisioning Database features

Online object history Separate spaces for analysis Granular sharing of changes

Page 43: Palantir's Revisioning Database

Other issues not covered

Branching history in investigations Loading links Conflict resolution Synchronizing external indices (search server)

Page 44: Palantir's Revisioning Database

© 2008 Palantir Technologies Inc. All rights reserved.

Palantir Revisioning DatabaseBob McGrew

Director of Engineering