of 37 /37
REST::Neo4p A Perl Driver for Neo4j Mark A. Jensen SRA International, Inc. 1 https://github.com/majensen/rest-neo4p.git

REST::Neo4p - Talk @ DC Perl Mongers

Embed Size (px)

DESCRIPTION

Updated Slides for DC PerlMongers meetup - REST::Neo4p Perl driver for Neo4j graph database

Citation preview

Page 1: REST::Neo4p - Talk @ DC Perl Mongers

1

REST::Neo4pA Perl Driver for Neo4j

Mark A. JensenSRA International, Inc.

https://github.com/majensen/rest-neo4p.git

Page 2: REST::Neo4p - Talk @ DC Perl Mongers

2

• Perler since 2000• CPAN contributor (MAJENSEN) since 2009• BioPerl Core Developer• Scientific Project Director, The Cancer Genome Atlas

Data Coordinating Center• @thinkinator, LinkedIn

Page 3: REST::Neo4p - Talk @ DC Perl Mongers

3

Motivation

• TCGA: Biospecimen, Clinical, Genomic Data– complex– growing– evolving technologies– evolving policies– need for precise accounting

• Customer suggested Neo4j– I wanted to play with it, but in Perl

Page 4: REST::Neo4p - Talk @ DC Perl Mongers

4

Feeping Creaturism

• 2012 - Some Perl experiments out there, nothing complete

• Got excited• People started using it• Sucked into the open-source attractor

MaintenanceGlory

Page 5: REST::Neo4p - Talk @ DC Perl Mongers

5

Dog Food I

Page 6: REST::Neo4p - Talk @ DC Perl Mongers

6

Dog Food II

Page 7: REST::Neo4p - Talk @ DC Perl Mongers

7

Woof!Neo4p Classes

Page 8: REST::Neo4p - Talk @ DC Perl Mongers

8

Design Goals• "OGM" – Perl 5 objects backed by the graph• User should never have to deal with a REST endpoint*

*Unless she wants to.• User should never have to deal with a Cypher query†

†Unless he wants to.• Robust enough for production code

– System should approach complete coverage of the REST service

– System should be robust to REST API changes and server backward-compatible (or at least version-aware)• Take advantage of the self-describing features of the API

Page 9: REST::Neo4p - Talk @ DC Perl Mongers

9

REST::Neo4p core objects• Are Node, Relationship, Index

– Index objects represent legacy (v1.0) indexes– v2.0 “background” indexes handled in Schema

• Are blessed scalar refs : "Inside-out object" pattern– the scalar value is the item ID (or index name)– For any object $obj, $$obj (the ID) is exactly what you need for

constructing the API calls• Are subclasses of Entity

– Entity does the object table handling, JSON-to-object conversion and HTTP agent calls

– Isolates most of the kludges necessary to handle the few API inconsistencies that exist(ed)

Page 10: REST::Neo4p - Talk @ DC Perl Mongers

10

Auto-accessors

• You can treat properties as object fields if desired

• Caveat: this may not make sense for your application (not every node needs to have the same properties, but every object will possess the accessors currently)

Page 11: REST::Neo4p - Talk @ DC Perl Mongers

11

Batch Calls

• Certain situations (database loading, e.g.) make sense to batch : do many things in one API call rather than many single calls

• REST API provides this functionality• How to make it "natural" in the context of

working with objects?– Use Perl prototyping sugar to create a "batch

block"

Page 12: REST::Neo4p - Talk @ DC Perl Mongers

12

Example:

Rather than call the server for every line, you can mix in REST::Neo4p::Batch, and then use a batch {} block:

Calls withinblock are collected anddeferred

Page 13: REST::Neo4p - Talk @ DC Perl Mongers

13

You can execute more complex logic within the batch block, and keep the objects beyond it:

Page 14: REST::Neo4p - Talk @ DC Perl Mongers

14

But miracles are not yet implemented:

Object here doesn't really exist yet…

Page 15: REST::Neo4p - Talk @ DC Perl Mongers

15

How does that work?

• Agent module isolates all bona fide calls– very few kludges to core object modules req'd

• batch() puts the agent into “batch mode” and executes wrapped code– agent stores incoming calls as JSON in a queue

• After wrapped code is executed, batch() switches agent back to normal mode and has it call the batch endpoint with the queue contents

• Batch processes the response and creates objects if requested

Page 16: REST::Neo4p - Talk @ DC Perl Mongers

16

Batch Profiling• Used Devel::NYTProf, nytprofhtml– Flame graph:

Vertical : unique call stack (call on top is on CPU)Horizontal : relative time spent in that call stack configuration Color : makes it look like a flame

Page 17: REST::Neo4p - Talk @ DC Perl Mongers

17

Batch Profiling

Page 18: REST::Neo4p - Talk @ DC Perl Mongers

18

Batch Profiling

batch keep : 1.1 of 1.2s

batch discard: 1.0 of 1.1s

no batch: 13.0 of 13.9 s

Page 19: REST::Neo4p - Talk @ DC Perl Mongers

19

Batch/keep

Batch ProfilingNo batch

Page 20: REST::Neo4p - Talk @ DC Perl Mongers

20

HTTP Agent

Page 21: REST::Neo4p - Talk @ DC Perl Mongers

21

Agent• Is transparent

– But can always see it with REST::Neo4p->agent– Agent module alone meant to be useful and independent

• Elicits and uses the API self-discovery feature on connect()• Isolates all HTTP requests and responses• Captures and distinguishes API and HTTP errors

– emits REST::Neo4p::Exceptions objects• [Instance] Is a subclass of a "real" user agent:

– LWP::UserAgent– Mojo::UserAgent, or – HTTP::Thin

Page 22: REST::Neo4p - Talk @ DC Perl Mongers

22

Working within API Self-DescriptionGet first level of

actions

Register actions

Get ‘data’ level of actionsRegister more

actions

Kludge around missing actions

Page 23: REST::Neo4p - Talk @ DC Perl Mongers

23

Working within API Self-Description

• Get the list of actions with – $agent->available_actions

• And AUTOLOAD will provide (see pod for args):– $agent->get_<action>()– $agent->put_<action>()– $agent->post_<action>()– $agent->delete_<action>()

• Other accessors, e.g. node(), return the appropriate URL for your server

Page 24: REST::Neo4p - Talk @ DC Perl Mongers

24

Agent Profiling

lwp: 2.5 of 2.7 s

mojo: 3.3 of 3.6s

thin: 2.4 of 2.6s

Page 25: REST::Neo4p - Talk @ DC Perl Mongers

25

App-level Constraints

Page 26: REST::Neo4p - Talk @ DC Perl Mongers

26

Use Case

You start out with a set of well categorized things, that have some well defined relationships.Each thing will be represented as a node, that's fine. But,

You want to guarantee (to your client, for example) that1. You can classify every node you add or read

unambiguously into a well-defined group; 2. You never relate two nodes belonging to particular

groups in a way that doesn't make sense according to your well-defined relationships.

Page 27: REST::Neo4p - Talk @ DC Perl Mongers

27

Constrain/Constraint

• Now, v2.0 allows integrated Labels and unique constraints and prevents deletion of connected nodes, but…

• REST::Neo4p::Constrain - An add-in for constraining (or validating)– property values– connections (relationships) based on node properties– relationship types

according to flexible specifications

Page 28: REST::Neo4p - Talk @ DC Perl Mongers

28

Constrain/Constraint

• Multiple modes:– Automatic (throws exception if constraint violated)– Manual (validation function returns false if

constraint violated)– Suspended (lift constraint processing when

desired)• Freeze/Thaw (in JSON) constraint

specifications for reuse

Page 29: REST::Neo4p - Talk @ DC Perl Mongers

29

Open the POD now, HAL.

Page 30: REST::Neo4p - Talk @ DC Perl Mongers

30

Cypher Queries

• REST::Neo4p::Query takes a familiar, DBI-like approach– Prepare, execute, fetch– "rows" returned are arrays containing scalars,

Node objects, and/or Relationship objects– If a query returns a path, a Path object (a simple

container) is returned

Page 31: REST::Neo4p - Talk @ DC Perl Mongers

31

Page 32: REST::Neo4p - Talk @ DC Perl Mongers

32

Cypher Queries

• Prepare and execute with parameter substitutions

Do This!

Not This!

Page 33: REST::Neo4p - Talk @ DC Perl Mongers

33

Cypher Queries

• Transactions are supported when you have v2.0.1 server or greater– started with REST::Neo4p->begin_work()– committed with REST::Neo4p->commit()– canceled with REST::Neo4p->rollback()(here, the class looks like the database handle in

DBI, in fact…)

Page 34: REST::Neo4p - Talk @ DC Perl Mongers

34

DBI – DBD::Neo4p

• Yes, you can really do this:

Page 35: REST::Neo4p - Talk @ DC Perl Mongers

35

Glory!

Maintenance.

Page 36: REST::Neo4p - Talk @ DC Perl Mongers

36

Future Directions/Contribution Ideas

• Get it onto GitHub https://github.com/majensen/rest-neo4p.git

• Make batch response parsing more efficient– e.g., don't stream if response is not huge

• Beautify and deodorize• Completely touch-free testing• Add traversal functionality• Could Neo4p play together with DBIx::Class? (i.e.,

could it be a real OGM?)

Page 37: REST::Neo4p - Talk @ DC Perl Mongers

37

Thanks!

https://github.com/majensen/rest-neo4p.git