HyperMembrane Structures for Open Source Cognitive Computing

HyperMembrane Structures for Open Source Cognitive Computing

Japanese Agency for Science and Technology Tokyo, Japan 3 March, 2015 Jack Park

© 2015 TopicQuests Foundation CC by SA 4.0

The Present Situation

Upon this gifted age, in its dark hour, Rains from the sky a meteoric shower Of facts . . . they lie unquestioned, uncombined. Wisdom enough to leech us of our ill Is daily spun; but there exists no loom To weave it into fabric

Edna St. Vincent Millay, 1939

2

Topics To Cover

• Discovery, learning, problem solving

• Topic Maps

• OpenSherlock

• HyperMembranes

• Open Source

• Key reasons for building open source cognitive systems

3

Cognitive Computing: My View

• Cognitive Computing is:

– Far less about what a computer knows

– Far more about how computers can augment human cognitive capabilities

– Based on the J.C.R Licklider and Douglas Engelbart augmentation work

J.C.R. Licklider

Douglas Engelbart

4 Imgs: Wikipedia

A Domain-specific Problem Statement

• An Example:

– Do these two sentences say the same thing?

• CO2 is a causal factor in climate change.

• Climate change is caused by carbon dioxide.

• Problem Statement

– Software agents need elegant methods for reading, representing, organizing, and modeling information resources to support discovery and answering questions.

5

A Framing Thought

• From [1]

– The understanding of global brain organization and its large-scale integration remains a challenge for modern neurosciences.

• To

– The understanding of global conversations about topics that matter and their large-scale federation remain a challenge for modern information technology.

[1] Petri G, Expert P, Turkheimer F, Carhart-Harris R, Nutt D, Hellyer PJ, Vaccarino F. (2014) Homological scaffolds of brain functional networks. J. R. Soc. Interface 11: 20140873.

6

Our Goals

• Improve Human-Tool Capabilities

• Augment existing analytic methods

– Increase opportunities for discovery

– Improve already sophisticated methods

• Build Looms

– Read documents

– Map and model topics read

– Weave information fabrics

Douglas Engelbart

7

Discovery

• Is it really possible for people to see everything? – Part of discovery is connecting dots not

yet connected. – “Cognitive Agents” can help increase

chances of serendipity.

“Discovery consists of seeing what everybody has seen and thinking what nobody has thought.”

–Albert Szent-Györgyi

8

Related Work

• Commercial – IBM Watson – Wolfram Alpha – Viv – Saffron 10 – Clueda – Siri – Google Now – Cortana – …

• Open Source – OAQA – DeepDive – OpenCog – OpenNARS – Watsonsim – YodaQA – AKSW OpenQA – AKSW QA – AquaLog – OpenSherlock – OpenIRIS (CALO) – …

• Research

– Project Aristo

– Project Halo

– FREyA

– CASIA

– NLP-Reduce

– EIS Sina

– WDAqua ITM

– Intui2

– …

9

Biologically Inspired Design

• Humans are blessed with:

– Memory to keep concepts organized and connected

– Internal mechanisms which map sensor data into memory for processing and storage

– The abilities of complex, adaptive, anticipatory systems

10

Memory: Introducing Topic Maps

• A Topic Map is like a library without all the books* – A Topic Map is indexical

• Like a card catalog – Each topic has its own representation

• Improving on a card catalog, a topic can be identified many different ways

• Captures metadata and optionally content

– A Topic Map is relational • Like a good road map

– Topics are connected by associations (relations) – Topics point to their occurrences in the territory

– A Topic Map is organized • Multiple records on the same topic are co-located (stored as one

topic) in the map

*a map is not its territory

11

TopicMap Structure

•Topics as Actors •Topics as Relations •Topics as Types •Topics as Biographies

12

Processing Mechanisms

• Typically, software processes take the form of variants of NLP (natural language processing)

– Parsers

– Cluster analysis

– Entity recognition

– Relation detection

– Role recognition

– Probabilistic methods

13

A Key Question in My Research

• Can a Topic Map learn (construct itself) by “reading” literature? – Relevant issues:

• Bootstrapping • Machine reading

– NLP – Linguistics – Statistics – Analogy & Metaphor – …

• Knowledge representation • Model building

– Anticipation

• Weaving information fabrics • Literature-based discovery • Deep Question Answering

14

A Simple Example

• Read this sentence: – Gene expression is caused by insoluble hormones

binding to a plasma membrane hormone receptor

• Topic Map recognizes: – Gene expression GeneExpression – insoluble hormones InsolubleHormone – plasma membrane hormone receptor

PlasmaMembraneReceptor

• Software agents transform: – is caused by Cause – binding to Binds

• Final semantic structure: • { {InsolubleHormone, Binds, PlasmaMembraneReceptor}, Cause, GeneExpression }

15

Introducing OpenSherlock

• OpenSherlock is: – A Topic Map for information resource identity and organization – A HyperMembrane information fabric structure – A society of agents system which can

• Read documents • Process information resources

– Maintain the topic map – Maintain the HyperMembrane – Build and maintain models – Perform discovery tasks – Answer questions

– Agents are coordinated by: • A blackboard system • A dynamic task-based agenda • Event propagation and handling

16

Observations 1

• A Topic Map is central to the key question, and therefore to a thesis entailed by this research

– It serves as a kind of memory for social processes

– It provides a robust platform for subject identity

– It can also serve as a repository for domain-specific vocabularies (ontologies, taxonomies, naming conventions,…)

17

Observations 2

• A Topic Map is necessary but not sufficient to support discovery, learning, or problem solving – It really only provides a powerful indexical structure related to

the key artifacts in any universe of discourse: • Actors • Their relations • Their states • Rules, laws, theories,…

• To model those key artifacts, other representation strategies are required – Conceptual Graphs – Qualitative Process Theory – Belief Networks – …

18

A Research Question

• What processes are available which, if performed while harvesting (reading) documents, can reduce the amount of processing required later during question answering? – The question entails

• Synthesis of ontology

• Co-reference resolution

• Re-representation during question lifting

• …

19

A Working Hypothesis

• Process

– Build and maintain a content-addressable memory of questions, claims, arguments, and evidence fields.

• We call that a HyperMembrane

– Note:

• Every text object passed into the system is processed by the same algorithms – Sentences harvested from text

– Questions and responses posed by humans

20

Key Concept: HyperMembrane

• HyperMembrane is a key concept in the working hypothesis that OpenSherlock seeks to explore and demonstrate – A growing graph as a collection of woven and

intersecting fabrics • constructed from normalized tuples (n-tuples) which

are designed to reduce the amount of NLP required to read documents

• such that intersections of fabrics occur where named entities in the graph of n-tuples are the same

– Inspired by Ted Nelson’s ZigZag Architecture

21

Machine Reading in OpenSherlock

• Goals: – Grow the topic map

• Topic Map then serves to support fabrication of higher-order knowledge structures

– Conceptual Graphs – Belief Networks – QP Theory Models – HyperMembrane – …

• Process Loop: – For a given document

• For every paragraph in that document – For every sentence in each paragraph

» Read the sentence

22

Sentence Reading

• First Step:

– Process sentence into word grams*

• Second Step:

– Where possible

• Transform word grams into n-tuples**

• n-tuples form the HyperMembrane

* A container of words, from 1 to 8 words per container

** A container of symbols based on words in word grams

23

Process Sentence into WordGrams

• Approach – Break sentence into word grams*

• WordGram objects are shared across sentences – Count of sentence identifiers associated with each object

serves as basis for probabilistic models

– Either • TopicMap recognizes terms

– Or • Sentence is parsed by Link-Grammar Parser**

• TopicMap learns from parse results

*http://en.wikipedia.org/wiki/W-shingling **http://www.link.cs.cmu.edu/link/ 24

Transform WordGrams to N-Tuples

• Normalized tuple (N-Tuple) – A structure where the subject, predicate, and object are normalized

• Nouns and verbs transformed – CO2, Carbon Dioxide, … CO2 – causes, is caused by, … cause

• Two sentence example – CO2 is a cause of climate change. – Climate change is caused by carbon dioxide. – Result:

» { CO2, cause, climate change }

– Normalization processes include general and domain specific lenses • Rule-based interpreters which detect structures

– Taxonomy – Causality – Biomedical – Geophysical – …

• Process models – Built and maintained while reading – Predict while reading – Anticipatory Reading

25

About N-Tuples

• An N-Tuple is a structured record of – Topics in the topic map – Those topics are harvested from text

• An N-Tuple takes the form: – { Subject, Predicate, Object } – Where

• Subject and/or Object can be one of: – A topic from the topic map – Another N-Tuple

• An N-Tuple is identified by the identities of the terms it contains – When thinking in terms of terms (words) read from documents, the identities

(numeric representations) of those terms form the identity of the N-Tuple object. • N-Tuples are content addressable

• Disambiguation of subjects is a topic mapping process – Learning means continuous refinement of subject identity – Ambiguities can also be solved through human intervention

26

N-Tuples as HyperMembrane

Tuples

{A, Bind, B}

{{A, Bind, B}, Cause, X}

{X, Bind, D}

{{X, Bind, D}, Cause, Y}

27

A B Bind

X

Y

D

Cause

Bind

Cause

Current State of OpenSherlock

ElasticSearch

Titan or

Blazegraph

Ontology Importer

Ontologies

PubMed Reader

PubMed Abstracts

HyperMembrane Engine

TellAsk

UMLS Importer

UMLS

28

Observations 3

• HyperMembrane is a reminding system – HyperMembrane is a record of federated human

conversation • Harvested from books, papers, and recorded

conversation

• Includes statistical properties of recorded utterances

– HyperMembrane records: • That which is common

• That which is novel – Possibly wrong

– Possibly game changing

29

TellAsk Interface

Conversation Tree User can click a

node to select as parent for any user response

Response Type Selectors. Selection required before

response.

User types here

Linear conversation flow

Entry Forms Selector List

Map starts a new conversation with entered topic

30

The Open Source Stack

• Persistence – ElasticSearch – Considering Titan – Considering Blazegraph (Bigdata™ RDF Store)

• Libraries – Many from Apache Foundation and others – LinkGrammarParser (Java version) – XML PullParser – Simple JSON Parser

• Tools – Eclipse

31

Summary

32

Current State of Development

• Aim to answer simple questions about casuality

– Current focus on biomedical domain

– Current focus on two lenses

• Taxonomy

• Casuality

– No Conceptual Graphs

– No Process Models

– No Probabilistic Models

33

Future Work

• Aim to complete an anticipatory system

– Process models for anticipation

– Conceptual graphs

– Probabilistic models

– More lenses

• Pluggable lenses

• Adaptive lenses

– More domains

34

Why Do This?

• Augment human capabilities in problem solving

• Participate in Open Science

35

Augmenting Social Sensemaking

1

2

3

Creating Ideas

Refining Connections

Connecting Ideas

Cancer patient

36

Participate in Open Science

37

Key Context for Open Science

• A planet-wide, collaborative quest for Global Thrivability*.

– Issues include

• Sociological events – Health, epidemics, wars,…

• Geophysical events – Climate change, earthquakes, volcanoes, …

• Astrophysical events – Asteroids, our Sun. …

* Let’s call the quest: EarthMoonshot

38

Completed Representation

antioxidants kill

free radicals

Contraindicates

macrophages use free radicals to

kill bacteria

Bacterial Infection Antioxidants

Because

Appropriate For

Compromised Host

Let us co-create Cognitive Agents for Discovery [email protected]

OpenSherlock documents at: http://debategraph.org/OpenSherlock Code emerging at: https://github.com/opensherlock/ Slides online at http://slideshare.net/jackpark/

Acknowledgments: Bob Gleichauf David Alexander Price Arun Majumdar Robert S. Stephenson Mark Szpakowski Martin Radley Sherry Jones Alexander Wenzowski Ted Kahn Patrick Durusau

39