The Power of Machine Learning and Graphs

Preview:

Citation preview

The power of

machine learning

and graphs

March 2017, Webinar

Jans Aasman

ja@franz.com

allegrograph.com

Contents

• Cognitive computing overview

• Why we put the results of analytics and machine learning

back in the graph

• Some examples first

• The main loop for data science with graphs

• Two AG open source packages for data science

• AllegRo: an R interface to AllegroGraph

• Python-Agraph: a Python interface to AllegroGraph (installable with

Anaconda)

• Future work

10 years ago

Structured Data

7 years ago

Structured Data Unstructured Data

7 years ago

Structured DataUnstructured Data

NLP, Key/Value stores, NoSQL, Big Data, Hadoop, IoT

4 to 5 years ago

Structured DataUnstructured Data and IOT

KnowledgeDomain knowledge

Linked Open Data

Vocabularies

Taxonomies/Ontologies

New #1: Learning. Feed output of data

science back into data infrastructure

Structured

Data

Unstructured

Data and IOT

KnowledgeDomain knowledge

Linked Open Data

Vocabularies

Taxonomies/Ontolo

gies

Probabilistic

Inferences.

New # 2: Everything in one (distributed)

Semantic Graph

Structured

Data

Unstructured

Data

KnowledgeDomain

knowledge

Linked Open Data

Vocabularies

Taxonomies/Ontol

ogies

Probabilistic

Inferences.Unstructured

Data and IOT

KnowledgeDomain

knowledge

Linked Open Data

Vocabularies

Taxonomies/Ontol

ogies

AKA: Cognitive Computing

Structured

Data

Unstructured

Data

KnowledgeDomain

knowledge

Linked Open Data

Vocabularies

Taxonomies/Ontol

ogies

Probabilistic

Inferences.Unstructured

Data and IOT

KnowledgeDomain

knowledge

Linked Open Data

Vocabularies

Taxonomies/Ontol

ogies

Current state of analytics

Usually the output of data science results in reports and publications but

• No formal trace where the data came from

• No formal link to the actual methods you used,

or who did it, or when you did it

• Cannot be compared to earlier results

• Cannot be used as building blocks for further research

• In general : the output is not queryable and discoverable

Enriching the graph with analytics

True Machine Learning

• results become data

• build layers of analytics

• Formal provenance for all results. Links to the actual data and methods

you used, or who did it, or when you did it, or even why you did it.

• Important for compliance and auditability

• Important for explaining why you took certain actions

• Historical analysis

• Results become queryable and discoverable

• The analytics fits in the total infrastructure of structured/unstructured and

knowledge.

Odds ratio

Association rules

K-means clustering

In the ecommerce world: find similar objects based on > 10 criteria, including description, product codes, pictures, etc

The main loop for data science with graphs

AllegroGraph

SPARQL

dataframe

R Python SPARK

results

AllegRo: work with AG directly from R

• Line1

AllegRo: work with AG directly from R

• The entire AllegroGraph API directly available from R

• Create/open databases, add/delete/query, SPARQL 1.1

• Create data frames directly from getStatement or SPARQL queries

• Will work with free AllegroGraph trial version

Quick tutorial demo

Agraph-python (available on github but please use Anaconda to install)

• Line1

Title

• Line1

Quick tutorial demo Python & Anaconda

create an environment in Anaconda2

• conda create -name testenv -c franzinc agraph-python numpy pandas matplotlib

activate environment

• source activate testenv

if you want to install a particular version of agraph-pythonif you want to install a particular version of agraph-python

• conda install -c franzinc agraph-python=6.2.0

or, if an older version is already installed:

• conda update -c franzinc agraph-python

IRIS example

• Classify 3 types of Irises: Setosa, Virginica, Versicolour

• based on petal length, petal width,

sepal length, sepal width.

Future Webcasts

• Formal ontologies to represent analytic output

• Using Knime as a data science framework

• Distributed AllegroGraph & SPARK

SDL Super-

Learner

Conclusion

• Adding data science results back to the graph is an valuable new

paradigm

• We make it really straight forward to do data science with AllegroGraph

• Try it

Recommended