Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Analyzing Blockchain and Bitcoin Transaction Data as Graph
Oracle Code | 2018-06-12 | Funkhaus Berlin
Karin Patenge | [email protected] Development Manager TechnologyOracle Deutschland B.V. & Co. KG
Hans Viehmann | [email protected] Manager Spatial and Graph TechnologiesOracle Corporation
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
• This presentation is based on the works of:
• Zhe (Alan) Wu
• Architect for Graph and Semantic Technologies @ Oracle Corporation
• Email: [email protected]
Acknowledgement
@kpatenge @alanzwu @SpatialHannes
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Agenda
• Modeling of Bitcoin Transactions
• Questions of Interest
• Data Processing Workflow
• Summary
• Q&A
@kpatenge @alanzwu @SpatialHannes
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Setting the Scene: Analyze Bitcoin Transaction Data
@kpatenge @alanzwu @SpatialHannes
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Setting the Scene: Interesting Patterns in Bitcoin Transaction Data
@kpatenge @alanzwu @SpatialHannes
Source: http://blockchain.info
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
What does a Bitcoin Transaction look like?
• A transaction has input(s) and output(s)– An input comes from an output of a(nother) transaction
TX hash: 6f7cf9580f1c2dfb3c4d5d043cdbb128c640e3f20161245aa7372e9666168516
TX outputSum : 10000000000
-- TX Input from: ff3dc8b461305acc5900d31602f2dafebfc406e5b050b14a352294f0965e0bf6:0
-- TX Input from: 2db69558056d0132d9848851fd20329be9cd590fa5ae2b3c55f58931f42e27f7:0
-- TX Output value: 10000000000
-- TX Output scriPubAddr: 12higDjoCCNXSA95xZMWUdPvXNmkAduhWv
Note: 1,000,000 is 0.01 BTC
@kpatenge @alanzwu @SpatialHannes
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
What does a Bitcoin Transaction look like?
• A transaction has input(s) and output(s)–An input comes from an output of a(nother) transaction
TX9
TX1
TX8
TX3
Addr X
Addr K
Addr LAddr Y
Addr Z$
$
$
$ $
$
$
$
@kpatenge @alanzwu @SpatialHannes
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
What does a Graph look like?
• A graph has vertices (entities), edges (relationships), and properties–Also known as linked data
TX9
TX1
TX8
TX3
Addr X
Addr K
Addr LAddr Y
Addr Z$
$
$
$ $
$
$
$
@kpatenge @alanzwu @SpatialHannes
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
• Model 1
– Vertices: Transaction, Address
– Edges: Transaction references(TX TX, TX Addr)
• Model 2
– Vertices: Transaction, Address
– Edges: Transaction‘s indirect reference to Address(Addr TX Addr)
• Model 3
– Vertices: Address
– Edges: Address to Address payment (Addr Addr)
Modeling Bitcoin Transactions as a Graph
TX9
TX1
TX8
TX3
AddrX
AddrK
AddrL
AddrY
AddrZ
$
$
$
$ $
$
$
$
TX9
TX1
TX8
TX3
AddrX
AddrK
AddrL
AddrY
AddrZ
$
$
$
$ $
$
$
$
TX9
TX1
TX8
TX3
AddrX
AddrK
AddrL
AddrY
AddrZ
$
$
$
$ $
$
$
$
@kpatenge @alanzwu @SpatialHannes
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
• Graph Model 3–What is Addr X´s contribution to
Addr K?
– Given an input address i, output address o-> Contribution of i to o is:
Bitcoin Transactions as a Graph: Money Flow
TX9
TX1
TX8
TX3
Addr X
Addr K
Addr LAddr Y
Addr Z
$
$
$
$$
$
$
$
o
i i
i AmountAmount
Amount•
@kpatenge @alanzwu @SpatialHannes
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Functions of a Graph Database
Bitcoin Transactions as a Graph: Workflow
Graph Generation & Loading
Data Preparation
Graph Querying & Analysis
Graph Visualization
Retrieving& Parsing Data
@kpatenge @alanzwu @SpatialHannes
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Modeling Data as GraphsThe more connected the data is, the better a Graph fits
Oracle NoSQL DB with Big Data Spatial and GraphGraphic source: http://www.ateam-oracle.com/intro-to-graphs-at-oracle/
@kpatenge @alanzwu @SpatialHannes
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
• A set of nodes (aka vertices)– each vertex has a unique identifier
– each vertex has a set of in/out edges
– each vertex has a collection of key-value properties
• A set of edges – each edge has a unique identifier
– each edge has a head/tail vertex
– each edge has a label denoting type of relationship between two vertices
– each edge has a collection of key-value properties
• Blueprints Java APIs
• Implementations – Oracle (Spatial and Graph, Big Data Spatial and
Graph), Neo4j, DataStax (Titan), InfiniteGraph, Dex, Sail, MongoDB, …
What is a Property Graph?
https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model
@kpatenge @alanzwu @SpatialHannes
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Property Graph Support
Graph Data Access Layer (DAL)
Graph Analytics
Blueprints & Lucene/SolrCloud RDF (RDF/XML, N-Triples, N-Quads,
TriG,N3,JSON)
REST/W
eb
Se
rvice/N
ote
bo
oks
Java, Gro
ovy, P
ytho
n, …
Java APIs
Java APIs/JDBC/SQL/PLSQL
Property Graph formats
GraphML
GML
GraphSON
Flat FilesScalable and Persistent Storage Management
Parallel In-Memory Graph Analytics (PGX) /
Graph Querying (PGQL)
Oracle NoSQL Database
Oracle RDBMS Apache HBase
Apache Spark
@kpatenge @alanzwu @SpatialHannes
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Demo Environment
• Available for free: Oracle Big Data Lite VM 4.11 running in Oracle VirtualBox
– Oracle NoSQL Database (kvlite: unclustered -> 1 node, no replication)
– Big Data Spatial and Graph (BDSG) 2.4http://www.oracle.com/technetwork/database/bigdata-appliance/oracle-bigdatalite-2104726.html
• Property Graph Analytics Engine (PGX), Property Graph Query Language (PGQL)
• Gremlin, Apache Groovy (Shell)
• Zeppelin Notebook with PGX Interpreter
– Property Graph Format
• Oracle Flat Files
– Cytoscape 3.6.0
• Big Data Spatial and Graph 2.4 support installed
@kpatenge @alanzwu @SpatialHannes
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Definition Bitcoin transaction data sample
[oracle@bigdatalite data]$ head –n 5 btc.opv
1,bt_addr,1,1111111111111111111114oLvT2,,
2,bt_addr,1,11126yHiXjavR3oNVwV2GRNso2ah4MnZtm,,
3,bt_addr,1,11128BtJwtyW4q9eRe3zts6BB4jg4uKLv8,,
4,bt_addr,1,111HnjYiCubyhPjtmZ7jEQjYcYBpKZHvJ,,
5,bt_addr,1,111KHWctzJ8tsTbittCDVzmTHVjxQR2g4,, [oracle@bigdatalite data]$
Oracle Flat File Format: Vertices
Field # Name Description
1 vertex_ID An integer that uniquely identifies the vertex
2 key_name The name of the key in the key-value pair
3 value_type 1=String, 2=Integer, 3=Float, ...
4 value The encoded, non-null value of key_namewhen it is neither numeric nor date
5 value The encoded, non-null value of key_namewhen it is numeric
6 value The encoded, nonnull value of key_namewhen it is a timestamp (date)
Source: http://blockchain.info
@kpatenge @alanzwu @SpatialHannes
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Definition Bitcoin transaction data sample
[oracle@bigdatalite data]$ head –n 5 btc.ope
1,317335,91594,contrib,trans_hash,1,4391b11d991e7c9ad4f9a1a5a7ea9ed7f234643b0c883f49511e1394a5ab8ff5,,
1,317335,91594,contrib,amount,3,,5.0E9,
2,357443,91594,contrib,trans_hash,1,4391b11d991e7c9ad4f9a1a5a7ea9ed7f234643b0c883f49511e1394a5ab8ff5,,
2,357443,91594,contrib,amount,3,,5.0E9,
3,352850,91594,contrib,trans_hash,1,4391b11d991e7c9ad4f9a1a5a7ea9ed7f234643b0c883f49511e1394a5ab8ff5,,
3,352850,91594,contrib,amount,3,,5.0E9,
4,308829,91594,contrib,trans_hash,1,4391b11d991e7c9ad4f9a1a5a7ea9ed7f234643b0c883f49511e1394a5ab8ff5,,
4,308829,91594,contrib,amount,3,,5.0E9,
5,314511,11714,contrib,trans_hash,1,2e8250e9f3f8043cdad60f747982275fee2a1836ebb48b2f620d03371be8e3f6,,
5,314511,11714,contrib,amount,3,,5.0E9,
[oracle@bigdatalite data]$
Oracle Flat File Format: Edges
Field # Name Description
1 edge_ID An integer that uniquely identifies the edge
2 source_vertex_ID The vertex_ID of the outgoing tail of the edge
3 dest_vertex_ID The vertex_ID of the incoming head of the edge
4 edge_label The encoded label of the edge, which describes the relationship between the two vertices
5 key_name The encoded name of the key in a KV pair
6 value_type 1=String, 2=Integer, 3=Double, ...
7 value The encoded, nonnull value of key_name when it is neither numeric nor timestamp (date)
8 value The encoded, nonnull value of key_name when it is numeric
9 value The encoded, nonnull value of key_name when it is a timestamp (date)
@kpatenge @alanzwu @SpatialHannes
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Graph Generation and Loading using Vertices & Edges files// Start Groovy Shell connecting to Oracle NoSQL DBcd /opt/oracle/oracle-spatial-graph/property_graph/dal/groovy./gremlin-opg-nosql.sh
server = new ArrayList(); server.add("bigdatalite.localdomain:5000");
// Create a graph config with graph name "btc" // Name of key-value store is "kvstore"// Make sure to add all vertex/edge properties neededcfg = GraphConfigBuilder.forPropertyGraphNosql() \.setName("btc") \.setStoreName("kvstore") \.setHosts(server) \.addVertexProperty("bt_addr", PropertyType.STRING, "NA") \.addEdgeProperty("amount", PropertyType.FLOAT, 1.0f) \.hasEdgeLabel(true) \.setLoadEdgeLabel(true) \.setMaxNumConnections(2) \.build();
// Create an instance of the graphopg = OraclePropertyGraph.getInstance(cfg);opg.getKVStoreConfig();
// Prepare for data loadopg.setClearTableDOP(2);opg.clearRepository();
// Create an instance for the graph data loaderopgdl=OraclePropertyGraphDataLoader.getInstance();
// Flat files with vertices & edges of Bitcoin txs vfile="/home/oracle/Documents/BTC/data/btc.opv";efile="/home/oracle/Documents/BTC/data/btc.ope
// Load data into the graphopgdl.loadData(opg, vfile, efile, 2);
// Do some checks// Count vertices and edgesopg.countVertices();opg.countEdges();
// Get vertices and edgesopg.getVertices();opg.getEdges();...
// Shut down instance and close shellopg.shutdown();:q
@kpatenge @alanzwu @SpatialHannes
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
PGX – Graph Analytics Engine
• Toolkit for In-Memory, Parallel Graph Analysis containing– PGX shell
– Analyst API with a large collection of built-in Graph algorithms
– and more
• Developed by Oracle Labs– http://www.oracle.com/technetwork/oracle-
labs/parallel-graph-analytix/overview/index.html
– https://event.cwi.nl/grades/2018/07-VanRest.pdf
– https://docs.oracle.com/cd/E56133_01/latest/tutorials/index.html
PGQL – Property Graph Query Language
• SQL-like Graph Pattern Matching– WHERE clause set of comma-separated
constraints
• Developed by Oracle Labs– http://pgql-lang.org/
• Proposed for standardization
Graph Querying and Analysis
@kpatenge @alanzwu @SpatialHannes
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Analyze Bitcoin Transaction Data using PGX
• Start PGX server/opt/oracle/oracle-spatial-graph/property_graph/pgx/bin/start-server
• Start / Return to Groovy Shell// Create in-memory analyst sessionsession=Pgx.createSession("session_ID_1");analyst=session.createAnalyst();
// Read the graph from Oracle NoSQL DB into memorypgxGraph = session.readGraphWithProperties(opg.getConfig());
// Working with In-Memory Analyst// Execute Page Rankrank=analyst.pagerank(pgxGraph, 0.0001, 0.85, 100);// Get top 10 vertices rank.getTopKValues(10);
// BetweenNess Centralitybc=analyst.vertexBetweennessCentrality(pgxGraph);// Get top 10 verticesbc.getTopKValues(10);
...
@kpatenge @alanzwu @SpatialHannes
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Analyze Bitcoin Transaction Data using PGXUsing Zeppelin Notebook with PGX Interpreter
@kpatenge @alanzwu @SpatialHannes
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
• Topology constraints▪ (n)–[e]–>(m)
▪ (n)–[e1]–>(m1), (n)–[e2]–>(m2)
▪ (n1)-[e1]->(n2)-[e2]->(n3)-[e3]->(n4)
▪ (n1)-[e1]->(n2)<-[e2]-(n3)
• Label matching▪ (x:Person) -[e:likes]-> (y:Person)
▪ (:Person) -[:likes]-> (:Person)
▪ (x:Student|Professor) -[e:likes|knows]-> (y:Student|Professor)
• Value constraints▪ (x) -> (y), x.name = 'John’, y.age > 25
• In-Line constraints▪ (n WITH name = 'John' OR name = 'James', type =
'Person') -[e WITH type = 'workAt', workHours < 40]-> ()
• …
Syntax form Examples
Basic form (n)-[e]->(m)
Omit variable name of the source vertex
()-[e]->(m)
Omit variable name of the destination vertex
(n)-[e]->()
Omit variable names in both vertices ()-[e]->()
Omit variable name in edge (n)-->(m)
Omit variable name in edge (alternative, one dash)
(n)->(m)
Querying Property Graph Data using PGQL
@kpatenge @alanzwu @SpatialHannes
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Query Bitcoin Transaction Data using PGQL// Some PGQL queries
// Explore relationships in the graphpgxResultSet = pgxGraph.queryPgql("SELECT e.label(), count(*) WHERE (n) -[e]-> (m) GROUP BY e.label() ORDER BY count(*) DESC");pgxResultSet.print();
// Find top most collaborative Bitcoin addressespgxResultSet = pgxGraph.queryPgql("SELECT n, count(*) WHERE (n) -[e:contrib]-> (m) GROUP BY n ORDER BY count(*) DESC LIMIT 10");pgxResultSet.print(3);
// Find top least collaborative Bitcoin addressespgxResultSet = pgxGraph.queryPgql("SELECT n, count(*) WHERE (n) -[e:contrib]-> (m) GROUP BY n ORDER BY count(*) ASC");pgxResultSet.print(3);
// InDegree countpgxResultSet = pgxGraph.queryPgql("SELECT y.id(), y.bt_addr, x.inDegree() WHERE (x) -> (y), x.inDegree() > 1000 ORDER BY x.inDegree() DESC");pgxResultSet.print(3);...
https://blogs.oracle.com/bigdataspatialgraph/how-many-ways-to-run-property-graph-query-language-pgql-in-bdsg-i
@kpatenge @alanzwu @SpatialHannes
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Query Bitcoin Transaction Data using PGQLUsing Zeppelin Notebook with PGX Interpreter
@kpatenge @alanzwu @SpatialHannes
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Visualize Bitcoin Transaction Data using Cytoscape
@kpatenge @alanzwu @SpatialHannes
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Pattern Analysis 01
@kpatenge @alanzwu @SpatialHannes
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Pattern Analysis 02: Addresses with incoming TX´s only
@kpatenge @alanzwu @SpatialHannes
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Pattern Analysis 03: Degree of Centrality
@kpatenge @alanzwu @SpatialHannes
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Summary
• Graph databases are powerful tools, complementing relational databases– Especially strong for analysis of graph topology and connectedness
• Graph analytics offer new insight– Especially relationships, dependencies and behavioural patterns
• Oracle Property Graph technology offers– Comprehensive analytics through various APIs, integration with relational database
– Scaleable, parallel in-memory processing
– Secure and scaleable graph storage using Oracle NoSQL, HBase or Oracle Database
• Available both on-premise or in the Cloud
Graph capabilities in Oracle Big Data Spatial and Graph
@kpatenge @alanzwu @SpatialHannes
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Property Graph running in the Oracle Cloud
@kpatenge @alanzwu @SpatialHannes
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |@kpatenge @alanzwu @SpatialHannes
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Rich set of built-in parallel graph algorithms
… and parallel graph mutation operations
Additional Information: PGX - Built-in Package
@kpatenge @alanzwu @SpatialHannes
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
• Getting Started – Creating a Property Graph on Oracle Database by Arthur Dayton (VlamisSoftware Solutions)https://blogs.oracle.com/oraclespatial/getting-started-creating-a-property-graph-on-oracle-database
• Improve your Meetup Experience using Graph Analytics by Karin Patenge (Oracle)https://de.slideshare.net/kpatenge
• Big Data Spatial and Graph In-Memory Analyst Java API: https://docs.oracle.com/bigdata/bda411/PGXJV/toc.htm
• Oracle Big Data Spatial and Graph on Oracle.com: www.oracle.com/database/big-data-spatial-and-graph
• OTN product page (white papers, software downloads, documentation, tutorials):www.oracle.com/technetwork/database/database-technologies/bigdata-spatialandgraph
• Oracle Big Data Lite Virtual Machine - a free sandbox to get started: www.oracle.com/technetwork/database/bigdata-appliance/oracle-bigdatalite-2104726.html
• Hands On Lab for Big Data Spatial: tinyurl.com/BDSG-HOL
• Blog – Examples, Tips & Tricks: blogs.oracle.com/bigdataspatialgraph
Resources on Oracle‘s Property Graph Support
@kpatenge @alanzwu @SpatialHannes