37
Use of Knowledge Graphs and Relational Machine Learning Frederic Clarke, Director MINDS Australian Bureau of Statistics

Use of Knowledge Graphs and Relational Machine Learningsemstats.org/2019/slides/clarke-relational-machine-learning.pdf · Relational Machine Learning Frederic Clarke, Director MINDS

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Use of Knowledge Graphs and Relational Machine Learningsemstats.org/2019/slides/clarke-relational-machine-learning.pdf · Relational Machine Learning Frederic Clarke, Director MINDS

Use of Knowledge Graphs and

Relational Machine Learning

Frederic Clarke, Director MINDS

Australian Bureau of Statistics

Page 2: Use of Knowledge Graphs and Relational Machine Learningsemstats.org/2019/slides/clarke-relational-machine-learning.pdf · Relational Machine Learning Frederic Clarke, Director MINDS

Machine Intelligence and Novel Data Sources

Multi-disciplinary team – maths, CS, KM

▪ Focus on hard problems, learn by doing

▪ Demonstrate solutions through prototypes

Investigate, experiment, evaluate and inform

▪ Methods, technologies and models

▪ New data sources and applications

▪ ABS strategic capabilities and priorities

▪ Environmental trends, opportunities, threats

2 SemStats 2019

Page 3: Use of Knowledge Graphs and Relational Machine Learningsemstats.org/2019/slides/clarke-relational-machine-learning.pdf · Relational Machine Learning Frederic Clarke, Director MINDS

The strategic context

A motivating example

ABS use of KL and RL

GLIDE and current work

Agenda

3 SemStats 2019

Page 4: Use of Knowledge Graphs and Relational Machine Learningsemstats.org/2019/slides/clarke-relational-machine-learning.pdf · Relational Machine Learning Frederic Clarke, Director MINDS

Strategic context

4 SemStats 2019

Page 5: Use of Knowledge Graphs and Relational Machine Learningsemstats.org/2019/slides/clarke-relational-machine-learning.pdf · Relational Machine Learning Frederic Clarke, Director MINDS

Inform Australia’s important decisions

▪ Producing new and relevant statistical insights

▪ Enabling effective and safe use of data

▪ Building national information capability for the future

On public policy, services and investment

ABS Mission

5 SemStats 2019

Page 6: Use of Knowledge Graphs and Relational Machine Learningsemstats.org/2019/slides/clarke-relational-machine-learning.pdf · Relational Machine Learning Frederic Clarke, Director MINDS

Driven by powerful global megatrends

▪ Intelligent Machines

▪ Digital Connectedness

▪ Data-driven World

Impact on government is profound

▪ Overturns extant business models

▪ Challenges traditional decision-making processes

Disruptive change

6 SemStats 2019

Lee Sedol and AlphaGo Zero

Page 7: Use of Knowledge Graphs and Relational Machine Learningsemstats.org/2019/slides/clarke-relational-machine-learning.pdf · Relational Machine Learning Frederic Clarke, Director MINDS

Most economic, social and environmental systems are complex

▪ Many interacting entities of different types

▪ Dynamic and non-linear relationships

▪ Emergent system structure and behaviour

Connectedness is an essential condition for complexity

Complex systems underlie most ‘wicked’ problems

Complex systems

7 SemStats 2019

Page 8: Use of Knowledge Graphs and Relational Machine Learningsemstats.org/2019/slides/clarke-relational-machine-learning.pdf · Relational Machine Learning Frederic Clarke, Director MINDS

Diverse new sources of human-generated

and machine-generated data

Can be used for statistical purposes

▪ Improve quality, time-to-market and cost-

effectiveness of mainstream statistical

products

▪ Create new statistical products that fill

information gaps

Issues: heterogeneity, bias and dimensionality

ABS operates in an increasingly congested

and contested environment

Big data

8 SemStats 2019

Personal

Devices

Imagery

Systems

Product

Scanners

Smart

Meters

Environmental

Monitors

Telematics

Devices

Web

Applications

Social Media

Services

Accounting

Systems

Logistics

Systems

Administrative

Collections

Surveys and

Censuses

Page 9: Use of Knowledge Graphs and Relational Machine Learningsemstats.org/2019/slides/clarke-relational-machine-learning.pdf · Relational Machine Learning Frederic Clarke, Director MINDS

Motivating example

9 SemStats 2019

Page 10: Use of Knowledge Graphs and Relational Machine Learningsemstats.org/2019/slides/clarke-relational-machine-learning.pdf · Relational Machine Learning Frederic Clarke, Director MINDS

Connectedness of policy concerns

10 SemStats 2019

Workforce ParticipationWhat types of workers are finding

employment in the new economy?

Export PerformanceWhat are the characteristics

of successful exporters?

Critical NetworksWhere are the bottlenecks for freight

movement in the national road system?

Population MovementAre job flows changing the geographic

distribution of the workforce?

Firm ProductivityWhat is the effect of skilled

migration on firm productivity?

Job CreationIn what industries, occupations

and locations are jobs being created?

Public ServicesDo existing public services meet

the needs of a changing population?

Research & InnovationWhat directions in STEM are

driving key growth industries?

Social WelfareWhat factors are associated with

long term welfare dependence?

Job Skills Is higher education meeting the

changing demand for skills?

Page 11: Use of Knowledge Graphs and Relational Machine Learningsemstats.org/2019/slides/clarke-relational-machine-learning.pdf · Relational Machine Learning Frederic Clarke, Director MINDS

Next generation analytical capability

11 SemStats 2019

Built on system-centric information models

▪ Composable, interpretable and semantically precise

▪ Join up interrelated concept and data spaces

▪ Connect individuals and groups in multiple ways

Dynamically integrates heterogeneous multisource data

▪ Structured and unstructured

▪ Cross-sectional and longitudinal temporal linkages

Page 12: Use of Knowledge Graphs and Relational Machine Learningsemstats.org/2019/slides/clarke-relational-machine-learning.pdf · Relational Machine Learning Frederic Clarke, Director MINDS

Next generation analytical capability

12 SemStats 2019

Enables multiple analytical perspectives and objectives

▪ Exploration (pattern sensing) – finding statistical features and correlations

▪ Explanation (model building) – testing hypotheses about the observed data

▪ Extrapolation (system simulation) – projecting beyond known cases

Page 13: Use of Knowledge Graphs and Relational Machine Learningsemstats.org/2019/slides/clarke-relational-machine-learning.pdf · Relational Machine Learning Frederic Clarke, Director MINDS

ABS use of KG and RL

13 SemStats 2019

Page 14: Use of Knowledge Graphs and Relational Machine Learningsemstats.org/2019/slides/clarke-relational-machine-learning.pdf · Relational Machine Learning Frederic Clarke, Director MINDS

System is depicted as a graph of entities and relationships

▪ Entity – individual thing or group of things

▪ Relationship – association between entities

▪ Entities interact through relationships of analytical interest

Use W3C Semantic Web formalism for knowledge graphs

▪ Graph composition (standard: RDF)

▪ Semantic modelling (standard: OWL)

▪ Knowledge discovery (standard: SPARQL)

Knowledge graphs

14 SemStats 2019

Page 15: Use of Knowledge Graphs and Relational Machine Learningsemstats.org/2019/slides/clarke-relational-machine-learning.pdf · Relational Machine Learning Frederic Clarke, Director MINDS

Knowledge graphs – simple example

SemStats 201915

Page 16: Use of Knowledge Graphs and Relational Machine Learningsemstats.org/2019/slides/clarke-relational-machine-learning.pdf · Relational Machine Learning Frederic Clarke, Director MINDS

Systems are partitioned the into context-specific analytical domains

▪ Example: Trade, Employment, Production, Education, Welfare, etc

Each domain has set of associated entity types and relationships

▪ Basic entity types can exist in multiple domains

▪ Relationships are usually context-specific and so bound to one domain

Domains are connected through common entity types

Analytical domains

16 PSDS 2018

Page 17: Use of Knowledge Graphs and Relational Machine Learningsemstats.org/2019/slides/clarke-relational-machine-learning.pdf · Relational Machine Learning Frederic Clarke, Director MINDS

Analytical domains – simple example

17 AIfG 2019

exports-itememployee-of

Person

Family Place

Business

Industry

Job

Product

Occupation

lives-in located-inpart-of industry

occupation holds-job provides-job exports-to

Manufacturing

Estonia

F17702281

Engineer

spouse-of

part-of

STFN

P76339989

“121 562 165”

person-name

“Jane

Rockford”

person-name

P77112270

“James

Rockford”

B9579303

business-name

SABN

“4663184981”

“Eazi

Packaging”

Melbourne

J30090088

“Production

Manager”

C6677102

“Cardboard

Box”

description

job-title

“EaziPak 1000”

product-name

TradeEmployment

Page 18: Use of Knowledge Graphs and Relational Machine Learningsemstats.org/2019/slides/clarke-relational-machine-learning.pdf · Relational Machine Learning Frederic Clarke, Director MINDS

Units and observations

18 SemStats 2019

Observable characteristics of an entity can change over time

▪ Example: person name, address, marital status, and employment details

Entities are associated with their respective observations in data

▪ Real world thing is represented by a unit entity (spine entity)

▪ Specific observations of a unit are represented by observation entities

▪ Observations are associated equivalence relationships

▪ Temporal segmentation is demarcated by event entities

Page 19: Use of Knowledge Graphs and Relational Machine Learningsemstats.org/2019/slides/clarke-relational-machine-learning.pdf · Relational Machine Learning Frederic Clarke, Director MINDS

Example: observations of the same person in different data sets

▪ Record-3 is much later that Record-1 and Record-2

▪ Significant events: Change of Residence, Marriage, Graduation

Associating observations – example

19 SemStats 2019

15-07-2015

17-05-2005

10-02-2005

Observation date

Page 20: Use of Knowledge Graphs and Relational Machine Learningsemstats.org/2019/slides/clarke-relational-machine-learning.pdf · Relational Machine Learning Frederic Clarke, Director MINDS

Associating observations – example

Deterministic association

▪ Current: identifier match using

common unique key

▪ Future: fact match using deductive

rules (FOL)

▪ Multiple class inheritance

20 SemStats 2019

Linked

(Deterministic)

“Jane

Rockford”

P76339989

person-name

“121 562 165”

has-STFN has-STFN

“Jane

Smyth”

person-name

P34090227

“121 562 165”match

Data Set

A

Data Set

B

U8477499

has-observation

Unit Entity

successor-of

Page 21: Use of Knowledge Graphs and Relational Machine Learningsemstats.org/2019/slides/clarke-relational-machine-learning.pdf · Relational Machine Learning Frederic Clarke, Director MINDS

Associating observations – example

21 SemStats 2019

Probabilistic association

▪ Current: similarity match on entity

characteristics using Sunter-Fellegi

model

▪ future: similarity match on entity

characteristics evaluation or

relational characteristics using

machine learning

▪ Multiple class inheritance

Linked

(Probabilistic)

“Jane

Smith”

P17122393

person-name

Attribute A1

has-attribute has-attribute

“Jane

Smyth”

person-name

P34090227

Attribute A1

match

Data Set

C

Data Set

B

U22571232

has-observation

Unit Entity

similar-to

Page 22: Use of Knowledge Graphs and Relational Machine Learningsemstats.org/2019/slides/clarke-relational-machine-learning.pdf · Relational Machine Learning Frederic Clarke, Director MINDS

weight

Representing weights/probabilities

Accommodates SF pairwise linking

▪ Candidate pairs with associated weights

▪ An observation can be liked to more

than one CP

▪ Much better represented in a property

graph construct

▪ Can we combine the two paradigms?

22 SemStats 2019

Linked

(Probabilistic)

“Jane

Smith”

P17122393

person-name

“Jane

Smyth”

person-name

P34090227Data Set

C

Data Set

B

U22571232

has-observation

Unit Entity

CP6747209 Candidate Pair0.86

in-candidate-pair in-candidate-pair

Page 23: Use of Knowledge Graphs and Relational Machine Learningsemstats.org/2019/slides/clarke-relational-machine-learning.pdf · Relational Machine Learning Frederic Clarke, Director MINDS

Associating observations – example

23 SemStats 2019

Linked

(Probabilistic)

“Jane

Smith”

P17122393

person-name

“Jane

Smyth”

person-name

P34090227

U22571232

has-observation

similar-to

Linked

(Deterministic)

“Jane

Rockford”

P76339989

person-name

successor-of

U8477499

has-observation

successor-of

same-as

Page 24: Use of Knowledge Graphs and Relational Machine Learningsemstats.org/2019/slides/clarke-relational-machine-learning.pdf · Relational Machine Learning Frederic Clarke, Director MINDS

Combining data

24 SemStats 2019

Extract entities and relationships from source data

▪ Automatically by content analysis (fact extraction) tools

Create the knowledge graph in SW form

▪ Asserted facts from source data

Associate observation entities in the graph

▪ Inferred facts from reasoning processes

Page 25: Use of Knowledge Graphs and Relational Machine Learningsemstats.org/2019/slides/clarke-relational-machine-learning.pdf · Relational Machine Learning Frederic Clarke, Director MINDS

Based on pattern of connections among entities

▪ Extend scope of probabilistic linking beyond IID assumption

Associate entities across time and in disparate data sources

▪ Needed when there are no reliable common identifiers

▪ Example: persons by family and household relationships

Detect events that involve changes in the structure of groups

▪ Example: business reorganisation, closure, takeover

Relational learning

25 SemStats 2019

Page 26: Use of Knowledge Graphs and Relational Machine Learningsemstats.org/2019/slides/clarke-relational-machine-learning.pdf · Relational Machine Learning Frederic Clarke, Director MINDS

Detecting change over time

26 SemStats 2019

Page 27: Use of Knowledge Graphs and Relational Machine Learningsemstats.org/2019/slides/clarke-relational-machine-learning.pdf · Relational Machine Learning Frederic Clarke, Director MINDS

Graph kernel learning

▪ Represent the structural form of a graph for use in kernel learning algorithms

▪ Kernels: Weisfeiler-Lehman (WL), Intersection Tree Path (ITP)

Tensor factorisation

▪ Manipulates 3-order adjacency tensor of the knowledge graph

▪ Estimate probability distribution over possible states of graph

▪ Algorithms: RESCAL, Complex Embedding, HOLE, TransE

RL approaches

27 AIfG 2019

Weisfeiler-Lehman (WL) kernel (de Vries, 2013) (Weisfeiler & Lehman, 1968) and the Intersection Tree Path (ITP) kernel (Lösch, et al., 2012)

Page 28: Use of Knowledge Graphs and Relational Machine Learningsemstats.org/2019/slides/clarke-relational-machine-learning.pdf · Relational Machine Learning Frederic Clarke, Director MINDS

Overview of GLIDE

28 SemStats 2019

Page 29: Use of Knowledge Graphs and Relational Machine Learningsemstats.org/2019/slides/clarke-relational-machine-learning.pdf · Relational Machine Learning Frederic Clarke, Director MINDS

GLIDE

29 ABS Internal Briefing

Graph-Linked Information Discovery Environment

Capability vision for enabling informed decision making

▪ About policy, services, investment and (possibly) regulation

Based on insights derived from different types of analysis

▪ Exploratory analysis, hypothesis testing, system simulation

Using a dynamic evidence base from diverse data sources

▪ Surveys, admin collections, sensors, transactions, web content, …

Page 30: Use of Knowledge Graphs and Relational Machine Learningsemstats.org/2019/slides/clarke-relational-machine-learning.pdf · Relational Machine Learning Frederic Clarke, Director MINDS

What GLIDE will provide

30 ABS Internal Briefing

Graph-Linked Information Discovery Environment

One platform for data analysis and linking

▪ Browser interface – rich, interactive, navigable, context-sensitive visualisation

▪ Program interface (R, Python) – statistical and econometric modelling

▪ Spatial, temporal and compositional perspectives of problem

▪ Deterministic and probabilistic linkage methods – common key, Sunter-Fellegi

▪ Extraction of entities and facts from heterogeneous data

▪ Reusable, plug-and-play models and components

Page 31: Use of Knowledge Graphs and Relational Machine Learningsemstats.org/2019/slides/clarke-relational-machine-learning.pdf · Relational Machine Learning Frederic Clarke, Director MINDS

What GLIDE will provide

31 ABS Internal Briefing

Graph-Linked Information Discovery Environment

A set of extensible, interoperable ‘data pools’

▪ Multiple structured or unstructured data sets

▪ Longitudinally linked data (given social license)

▪ Different entity and relationship types in the form of a knowledge graph

Page 32: Use of Knowledge Graphs and Relational Machine Learningsemstats.org/2019/slides/clarke-relational-machine-learning.pdf · Relational Machine Learning Frederic Clarke, Director MINDS

Pathway initiatives

32 SemStats 2019

Workforce ParticipationWhat types of workers are finding

employment in the new economy?

Export PerformanceWhat are the characteristics

of successful exporters?

Critical NetworksWhere are the bottlenecks for freight

movement in the national road system?

Population MovementAre job flows changing the geographic

distribution of the workforce?

Firm ProductivityWhat is the effect of skilled

migration on firm productivity?

Job CreationIn what industries, occupations

and locations are jobs being created?

Public ServicesDo existing public services meet

the needs of a changing population?

Research & InnovationWhat directions in STEM are

driving key growth industries?

Social WelfareWhat factors are associated with

long term welfare dependence?

Job Skills Is higher education meeting the

changing demand for skills?

2017 experiment – proved concept

Page 33: Use of Knowledge Graphs and Relational Machine Learningsemstats.org/2019/slides/clarke-relational-machine-learning.pdf · Relational Machine Learning Frederic Clarke, Director MINDS

Pathway initiatives – freight movement

33 SemStats 2019

Workforce ParticipationWhat types of workers are finding

employment in the new economy?

Export PerformanceWhat are the characteristics

of successful exporters?

Critical NetworksWhere are the bottlenecks for freight

movement in the national road system?

Population MovementAre job flows changing the geographic

distribution of the workforce?

Firm ProductivityWhat is the effect of skilled

migration on firm productivity?

Job CreationIn what industries, occupations

and locations are jobs being created?

Public ServicesDo existing public services meet

the needs of a changing population?

Research & InnovationWhat directions in STEM are

driving key growth industries?

Social WelfareWhat factors are associated with

long term welfare dependence?

Job Skills Is higher education meeting the

changing demand for skills?

Page 34: Use of Knowledge Graphs and Relational Machine Learningsemstats.org/2019/slides/clarke-relational-machine-learning.pdf · Relational Machine Learning Frederic Clarke, Director MINDS

Pathway initiatives – freight movement

34 SemStats 2019

Workforce ParticipationWhat types of workers are finding

employment in the new economy?

Export PerformanceWhat are the characteristics

of successful exporters?

Critical NetworksWhere are the bottlenecks for freight

movement in the national road system?

Population MovementAre job flows changing the geographic

distribution of the workforce?

Firm ProductivityWhat is the effect of skilled

migration on firm productivity?

Job CreationIn what industries, occupations

and locations are jobs being created?

Public ServicesDo existing public services meet

the needs of a changing population?

Research & InnovationWhat directions in STEM are

driving key growth industries?

Social WelfareWhat factors are associated with

long term welfare dependence?

Job Skills Is higher education meeting the

changing demand for skills?

What is the average speed of vehicles?

Where are most vehicles travelling?

Page 35: Use of Knowledge Graphs and Relational Machine Learningsemstats.org/2019/slides/clarke-relational-machine-learning.pdf · Relational Machine Learning Frederic Clarke, Director MINDS

35DDfG 2017

Freight movement – drilling down

Drill down for GPS points

How does congestion vary through the day?

Identify congested roads

Drill down to find congested road segments

Page 36: Use of Knowledge Graphs and Relational Machine Learningsemstats.org/2019/slides/clarke-relational-machine-learning.pdf · Relational Machine Learning Frederic Clarke, Director MINDS

Pathway initiatives – successful exporters

36 SemStats 2019

Workforce ParticipationWhat types of workers are finding

employment in the new economy?

Export PerformanceWhat are the characteristics

of successful exporters?

Critical NetworksWhere are the bottlenecks for freight

movement in the national road system?

Population MovementAre job flows changing the geographic

distribution of the workforce?

Firm ProductivityWhat is the effect of skilled

migration on firm productivity?

Job CreationIn what industries, occupations

and locations are jobs being created?

Public ServicesDo existing public services meet

the needs of a changing population?

Research & InnovationWhat directions in STEM are

driving key growth industries?

Social WelfareWhat factors are associated with

long term welfare dependence?

Job Skills Is higher education meeting the

changing demand for skills?

BLADE data

▪ Business Characteristics Survey (BCS)

▪ Business Activity Statements (BAS)

Merchandise export records

▪ Held by Department of Home Affairs (DHA)

▪ Lodged by exporters and agents via (CCF)

ABS business unit model (LE, EG, TAU)

Classifications (AHECC, SITC, ANZSIC, SISCA, AGSO, …)

Full extract BAS and BCS data

• 7 million businesses per year

• Complex relationships

Monthly export transactions data

• 800,000+ records per month

Page 37: Use of Knowledge Graphs and Relational Machine Learningsemstats.org/2019/slides/clarke-relational-machine-learning.pdf · Relational Machine Learning Frederic Clarke, Director MINDS

Questions?