Cam List Ore Talk 2011-02-01

Embed Size (px)

Citation preview

  • 8/4/2019 Cam List Ore Talk 2011-02-01

    1/27

    Camlistorehttp://camlistore.org/

    Brad Fitzpatrick,[email protected]

  • 8/4/2019 Cam List Ore Talk 2011-02-01

    2/27

    What is Camlistore?

    20% project, personal itch to scratcho sick of building CMS-like systemso livejournal, photos, brackup, scanning

    cabinet, my websites, ...ohacking since jun 2010, idle planning for

    ~year before that.

    storage system?

    o yeah, but not the whole story. "a way to store, sync, share, model and back

    up content" ...

  • 8/4/2019 Cam List Ore Talk 2011-02-01

    3/27

    Religion (or lack thereof)

    "cloud" or local? both.o for storageo for content creation (sync adapters)

    programming language? any, all.o interfaces are what matters.

    identity / verifiableness

    you should own your content, have backup

    dorks are great, but this must be usable.omental model: "Your stuff. Can't screw it

    up."

  • 8/4/2019 Cam List Ore Talk 2011-02-01

    4/27

    oh, right, it's an acronym...

    Content- Addressable, Multi- Layer, Indexed, Storage

  • 8/4/2019 Cam List Ore Talk 2011-02-01

    5/27

    Content-Addressable

    at the bottom-most layer, everything isaddressed by its digest of bytes

    Terminology:"Blob" -- 0 or more bytes. No extra-metadata."Blobref" -- handle to a blob, in form - e.g. sha1-8a30407962eeb19b309b78ddf587aea18ab55232

  • 8/4/2019 Cam List Ore Talk 2011-02-01

    6/27

    Content-Addressability properties

    trivial caching / syncing: you have it or youdon't.ono "which version do you have?"

    content-deduplicationomultiple users having same content,o filesystem backup snapshots

    incrementals are cheap etc...

  • 8/4/2019 Cam List Ore Talk 2011-02-01

    7/27

    Multi-Layer

    Unix school of thought:o small, well-defined composable tools

    Camlistore has multiple layers / parts:o blob server: super dumbo schema: how one might represent datao search/indexer: make sense of dumbness,

    datao frontend: interact with world, sharing.

  • 8/4/2019 Cam List Ore Talk 2011-02-01

    8/27

    Architecture

  • 8/4/2019 Cam List Ore Talk 2011-02-01

    9/27

    Blob Server: how dumb it is...

    Private operations, to owner of data only:oget(blobref) -> bloboput(blobref, blob)oenumerate(..) -> [(blobref, size), ...]

    Public/non-owner operations:o none!

    GET /camli/sha1-xxxxxxxxx HTTP/1.0....Hello, world!

  • 8/4/2019 Cam List Ore Talk 2011-02-01

    10/27

    Blob Server: dumbness continued

    so, just blobs. remember: no meta-data no "filenames" no "mime types" no "{create,mod,access} time" nothing seriously, no metadata!

    owe will fight you. (and have)

  • 8/4/2019 Cam List Ore Talk 2011-02-01

    11/27

    Uh, what can you do with that?

    Not a terrible lot. But let's start with an easy example at this

    layer...

  • 8/4/2019 Cam List Ore Talk 2011-02-01

    12/27

    Filesystem backups

    previous project: brackupo slide/dice/encrypt S3 backup, content-

    addressed, but only files C-A, not dirs. fossil/venti, git: recursive directories content-

    addressedogit: "tree objects"

    camlistore: "schema blobs"

  • 8/4/2019 Cam List Ore Talk 2011-02-01

    13/27

    Schema Blobs

    so if all blobs are just dumb blobs of bytes withno metadata,

    how do you store metadata?

    as blobs themselves! how to recognize it? same way you sniff a

    JPEG. magic. start with a '{'? parse as JSON? in memory schema -> JSON object serialization

    with "camliVersion" key == "schema blob"

  • 8/4/2019 Cam List Ore Talk 2011-02-01

    14/27

    Schema Blob

    Minimal "schema blob" is:{

    "camliVersion": 1,"camliType": "whatever"

    }Whitespace doesn't matter. Just must be valid JSON in itsentirety. Use whatever JSON libraries you've got.That one is named sha1-19e851fe3eb3d1f3d9d1cefe9f92c6f3c7d754f6

  • 8/4/2019 Cam List Ore Talk 2011-02-01

    15/27

    Schema Blob; type "file"

    {"camliVersion": 1,"camliType": "file","fileName": "foo.dat","unixPermission": "0644",...,"size": 6000133,"contentParts": [{"blobRef": "sha1-...dead", "size": 111},{"blobRef": "sha1-...beef", "size": 5000000, "offset": 492 },{"size": 1000000},{"blobRef": "digalg-blobref", "size": 22},]

    }

  • 8/4/2019 Cam List Ore Talk 2011-02-01

    16/27

    Schema Blob; type "directory"

    {"camliVersion": 1,"camliType": "directory","fileName": "foodir","unixPermission": "0755",...,"entries": "sha1-c3764bc2138338d5e2936def18ff8cc9cda38455",

    }

  • 8/4/2019 Cam List Ore Talk 2011-02-01

    17/27

    Schema Blob; type "static-set"

    {"camliVersion": 1,"camliType": "static-set","members": [

    "sha1-xxxxxxxxxxxx","sha1-xxxxxxxxxxxx","sha1-xxxxxxxxxxxx","sha1-xxxxxxxxxxxx","sha1-xxxxxxxxxxxx","sha1-xxxxxxxxxxxx",

    ],}

  • 8/4/2019 Cam List Ore Talk 2011-02-01

    18/27

    Backup your filesystem...

    $ camput --file $HOMEsha1-8659a52f726588dc44d38dfb22d84a4da2902fed(like git/hg/fossil, that identifier represents everything down.)Iterative backups are cheap, easy identifier to share, etc.

  • 8/4/2019 Cam List Ore Talk 2011-02-01

    19/27

    But what about mutable data?

    immutable data is easy to represent &reference

    how to represent mutable data in an

    immutable, content-addressed world? how to share a reference to a mutable object

    when changing an object mutates its name?

  • 8/4/2019 Cam List Ore Talk 2011-02-01

    20/27

    Objects & Permanodes

  • 8/4/2019 Cam List Ore Talk 2011-02-01

    21/27

    Terminology...

    "object": a set of blobs representing a mutable object. youmodify an object by adding a new mutation claim blob to theset."signed schema blob" or "claim": a schema blob that youJSON-sign. (OpenPGP) aside: bootstrapping tools for this."permanode": a blob that's just a signed schema blob of arandom number that serves as the anchor and reference point

    for the blob. like a "permalink" on the web.

  • 8/4/2019 Cam List Ore Talk 2011-02-01

    22/27

    Permanode$ camput --permanodesha1-ea799271abfbf85d8e22e4577f15f704c8349026$ camget sha1-ea799271abfbf85d8e22e4577f15f704c8349026{"camliVersion": 1,"camliSigner": "sha1-c4da9d771661563a27704b91b67989e7ea1e50b8","camliType": "permanode","random": "oj)r}$Wa/[J|XQThNdhE"

    ,"camliSig":"iQEcBAABAgAGBQJNRxceAAoJEGjzeDN/6vt8ihIH/Aov7FRIq4dODAPWGDwqL1X9Ko2ZtSSO1lwHxCQVdCMquDtAdI3387fDlEG/ALoT/LhmtXQgYTt8QqDxVduEK1or6/jqo3RMQ8tTgZ+rW2cj9f3Q/dg7el0Ngoq03hyYXdo3whxCH2x0jajSt4RCcgdXN6XmLlOgD/LVQEJ303Du1OhCvKX1A40BIdwe1zxBc5zkLmoa8rClAlHdqwogxYFY4cwFm+jJM5YhSPemNrDe8W7KT6r0oA7SVfOan1NbIQUel65xwIZBD0ah

    CXBx6WXvfId6AdiahnbZiBup1fWSzxeeW7Y2/RQwv5IZ8UgfBqRHvnxcbNmScrzlp3V3ZoY==BfKn"}

  • 8/4/2019 Cam List Ore Talk 2011-02-01

    23/27

    S h / I d

  • 8/4/2019 Cam List Ore Talk 2011-02-01

    24/27

    Search / Indexer...

    S h / I d

  • 8/4/2019 Cam List Ore Talk 2011-02-01

    25/27

    Search / Indexer

    subscribes to blobs in real-timeo or enumerates / mapreduces world on init

    builds index of:o directed blob graph,o resolved attributes,o set memberships,o dates, tags, ...owhatever's needed

    eventually consistent

    P i M d l & Sh i

  • 8/4/2019 Cam List Ore Talk 2011-02-01

    26/27

    Privacy Model & Sharing

    all your blobs & searches are privateo nothing public by default

    to share something (a blob, object, or search

    query e.g. "recent public photos of mine") youcreate a "share"claimo claim = "signed schema blob"

    demo: http://camlistore.org/docs/sharing

    Q ti ? htt // li t /

    http://camlistore.org/docs/sharinghttp://camlistore.org/docs/sharing
  • 8/4/2019 Cam List Ore Talk 2011-02-01

    27/27

    Questions? http://camlistore.org/