12
Data integration with FIBO and Legal Entity Identifiers A whole product proposal Ontology 2 Paul Houle +1 (607) 539 6254 KM Solutions William Freeman +1 (774) 301 1301

Automatic Financial Data Integration with FIBO

Embed Size (px)

DESCRIPTION

Data integration with FIBO and Legal Entity Identifiers Problem: businesses produce and exchange documents and messages in a wide variety of formats with different descriptors for entities and their properties Text, PDF, DOC, XML, JSON, CSV, XLS, FIBO, SWIFT, ISO 20022, FIX, etc. Archival documents stored in “data lakes” But 80% of the effort in any “data mining” program is in data cleaning Swap data is published in similar but not identical formats… Cash-flow modeling and prediction. Name and concept resolution as a service

Citation preview

Page 1: Automatic Financial Data Integration with FIBO

Data integration with FIBO

and Legal Entity Identifiers

A whole product proposal

Ontology2

Paul Houle+1 (607) 539 6254

KM SolutionsWilliam Freeman

+1 (774) 301 1301

Page 2: Automatic Financial Data Integration with FIBO

Problem:businesses produce and exchange documents and messages in a wide variety of formats with different descriptors for entities and their properties

Text, PDF, DOC, XML, JSON, CSV, XLS, FIBO, SWIFT, ISO

20022,FIX, etc.

Keeping track of this in interactive systems is hard

Archival documents stored in “data lakes”

But 80% of the effort in any “data mining”

program is in data cleaning

Clean knowledge and data packaged for low-latency queries and quality-

controlled decisions

Page 3: Automatic Financial Data Integration with FIBO

Solution:toolsets and methods now exist to drastically accelerate this process

Scalable-first Architecture

Business-friendly inference based on First-Order Logic

:BaseKB

Numerous sources of lexical, taxonomical, ontological and

operational information

Natural languagedocumentation

Statistical quality control of decision making systems; tests of

specific requirements

Reproducable packaging of software and data for almost any

cloud or virtual environment

SUGGESTEDUPPERMERGEDONTOLOGY

IEEE

Page 4: Automatic Financial Data Integration with FIBO

RDF SQL CSV

PRODUCTIONRULES

FIRST-ORDERLOGIC

ISOCOMMONLOGIC

OWL

MODAL AND HIGHER-ORDER LOGIC

TAXONOMIES, DICTIONARIESdefine and relate terms and

entities

CONCEPTUAL ONTOLOGIES, AND THESAURI

Properties of and specialized relationships between entities

OPERATIONAL ONTOLOGIESAND THEORIES

Targeted quality-controlled inference for specific domains

AUTOMATIC GENERATIONOF USER INTERFACES

Browsing interfaces, Mixed Initiative and Asynchronous

interaction

Page 5: Automatic Financial Data Integration with FIBO

Canonical data model

Cash-flow modeling and prediction

Swap Data Repositories

Swap data is published in similar but not identical formats…

Different field names“DISSEMINATION ID” vs “Dissemination Id”“EXECUTION TIMESTAMP” vs “EXEC TIMESTAMP”

Vocabularies used can be different: i.e. “TRUE” or “FALSE” vs “Y” and “N” or “0” and “1”

Columns are grouped: “PRICE NOTATION TYPE” and “PRICE NOTATION” or “{FIELD}_{NUMBER}_{SUBFIELD}”

Page 6: Automatic Financial Data Integration with FIBO

Column Profiling And AnalysisEmpirical Rules

“Squash case in identifiers”“Underscores can be equivalent to spaces”“EXECUTION” can be shortened to “EXEC”

Resolution against dictionaries

FIBO / ISO 20022 Controlled TermsWikipedia / Freebase / :BaseKBWordNet

Grammatical Patterns:

PRICE_NOTATION_1PRICE_NOTATION_2PRICE_NOTATION_CURRENCY_1PRICE_NOTATION_CURRENCY_2

Defined Data Formats

Resolution against XSD, OWL, UML or printdocumentation

Column Statistics, Hypothesis Generationand testing

Identify plausible interpretationsof fields as numeric, date, time,name, address, Boolean,Keys and controlled vocabulary terms

Type Constraints

Field has “end date” in name+

Field contains formatted datesProbable “end date” field

Ontology Fragment Matching

Inferred relationships tested forStructural match against knownontologies

Page 7: Automatic Financial Data Integration with FIBO

Test data

HypothesisGeneration

ConstraintSolving

Production Rules ++ Engine

Reference Ontologies

DataConversion

Rulebox

Production Rules Engine

Canonical Data Model

Clean Data

TheoryLibrary

DataConversion

Rulebox

Machine Readable

Description of API

ACTUSModel

Selectionand

ParameterDetermination

Possible Histories

Predictions aboutCash flow over time

Deployed toLibrary

Page 8: Automatic Financial Data Integration with FIBO

Name and concept resolution as a service

“The Walt Disney Company”“ ウォルト・ディズニー・カンパニー

دیزنی والت شرکت华特迪士尼公司

“Los Angeles, CA”“Los Ángeles”

“Los-Anĝeleso”“Лос Анжелес”

“The Parent Trap”“À nous quatre”

“Nie wierzcie bliźniaczkom”“Лос Анжелес”天生一对

Core database is independent of language

Context-sensitive name resolver drives user interface

and text analysis

m.09c7w0

english

spanish

short

long

“USA”

“United States of America”

“Estados Unidos”

JFK

“We landed at _”

Name

Context

Page 9: Automatic Financial Data Integration with FIBO

tabular & xml data Names

Spatial RelationshipsCompany-place associations

Company-company associations

Post Code City Region Country

94116 SAN FRANCISCO CALAFORNIA US19866 WILMINGTON DE US

1044 BRUSSELS BRUSSELS-CAPITOL BE6027 INNSBRUCK AT-07 AT

Resolution…

• Of registration authorities that issue numbers (i.e. “Company House”)

• Of companies to EDGAR CIK identifiers• Of companies against your customer list• Of companies against :BaseKB

Decomposition into independently scalable/tunable microservices

Page 10: Automatic Financial Data Integration with FIBO

Test data

Judgments Invariants

Requirements

System

Output data

Comparison

Humans check output for quality

Humans change fact base and rule base

situations

decisions

Backgroundknowledge

Machinelearning

Team can fix problems at the root of errors in a decision-processing pipeline with version control and paper trail or choose to bypass

Page 11: Automatic Financial Data Integration with FIBO

98.5% of addresses matched to region in global business database

resolvednot resolved

Systematic improvement by solving increasingly less common problems

Statistical Quality Control forSubjective Decisions

Proven Methodology of Automated Unit and Integration Testing

Test for wanted and unwanted

behaviors

“Good Enough”or improved

metrics

SystemCustomer

Testing accelerated by parallel and reproducible methods in Hybrid Cloud

Page 12: Automatic Financial Data Integration with FIBO

High Throughput / Scalable / Parallel

Efficient Data Structuresand serialization

driven by meta-model

Simple Binary Encoding

Specialized indexes for low-latency and expressive queries to support

user interfaces and decision-making

C L E O

The Need For Speed