Upload
jack-park
View
489
Download
1
Tags:
Embed Size (px)
Citation preview
HyperMembrane Structures for Open Source Cognitive Computing
Japanese Agency for Science and Technology Tokyo, Japan 3 March, 2015 Jack Park
© 2015 TopicQuests Foundation CC by SA 4.0
The Present Situation
Upon this gifted age, in its dark hour, Rains from the sky a meteoric shower Of facts . . . they lie unquestioned, uncombined. Wisdom enough to leech us of our ill Is daily spun; but there exists no loom To weave it into fabric
Edna St. Vincent Millay, 1939
2
Topics To Cover
• Discovery, learning, problem solving
• Topic Maps
• OpenSherlock
• HyperMembranes
• Open Source
• Key reasons for building open source cognitive systems
3
Cognitive Computing: My View
• Cognitive Computing is:
– Far less about what a computer knows
– Far more about how computers can augment human cognitive capabilities
– Based on the J.C.R Licklider and Douglas Engelbart augmentation work
J.C.R. Licklider
Douglas Engelbart
4 Imgs: Wikipedia
A Domain-specific Problem Statement
• An Example:
– Do these two sentences say the same thing?
• CO2 is a causal factor in climate change.
• Climate change is caused by carbon dioxide.
• Problem Statement
– Software agents need elegant methods for reading, representing, organizing, and modeling information resources to support discovery and answering questions.
5
A Framing Thought
• From [1]
– The understanding of global brain organization and its large-scale integration remains a challenge for modern neurosciences.
• To
– The understanding of global conversations about topics that matter and their large-scale federation remain a challenge for modern information technology.
[1] Petri G, Expert P, Turkheimer F, Carhart-Harris R, Nutt D, Hellyer PJ, Vaccarino F. (2014) Homological scaffolds of brain functional networks. J. R. Soc. Interface 11: 20140873.
6
Our Goals
• Improve Human-Tool Capabilities
• Augment existing analytic methods
– Increase opportunities for discovery
– Improve already sophisticated methods
• Build Looms
– Read documents
– Map and model topics read
– Weave information fabrics
Douglas Engelbart
7
Discovery
• Is it really possible for people to see everything? – Part of discovery is connecting dots not
yet connected. – “Cognitive Agents” can help increase
chances of serendipity.
“Discovery consists of seeing what everybody has seen and thinking what nobody has thought.”
–Albert Szent-Györgyi
8
Related Work
• Commercial – IBM Watson – Wolfram Alpha – Viv – Saffron 10 – Clueda – Siri – Google Now – Cortana – …
• Open Source – OAQA – DeepDive – OpenCog – OpenNARS – Watsonsim – YodaQA – AKSW OpenQA – AKSW QA – AquaLog – OpenSherlock – OpenIRIS (CALO) – …
• Research
– Project Aristo
– Project Halo
– FREyA
– CASIA
– NLP-Reduce
– EIS Sina
– WDAqua ITM
– Intui2
– …
9
Biologically Inspired Design
• Humans are blessed with:
– Memory to keep concepts organized and connected
– Internal mechanisms which map sensor data into memory for processing and storage
– The abilities of complex, adaptive, anticipatory systems
10
Memory: Introducing Topic Maps
• A Topic Map is like a library without all the books* – A Topic Map is indexical
• Like a card catalog – Each topic has its own representation
• Improving on a card catalog, a topic can be identified many different ways
• Captures metadata and optionally content
– A Topic Map is relational • Like a good road map
– Topics are connected by associations (relations) – Topics point to their occurrences in the territory
– A Topic Map is organized • Multiple records on the same topic are co-located (stored as one
topic) in the map
*a map is not its territory
11
TopicMap Structure
•Topics as Actors •Topics as Relations •Topics as Types •Topics as Biographies
12
Processing Mechanisms
• Typically, software processes take the form of variants of NLP (natural language processing)
– Parsers
– Cluster analysis
– Entity recognition
– Relation detection
– Role recognition
– Probabilistic methods
13
A Key Question in My Research
• Can a Topic Map learn (construct itself) by “reading” literature? – Relevant issues:
• Bootstrapping • Machine reading
– NLP – Linguistics – Statistics – Analogy & Metaphor – …
• Knowledge representation • Model building
– Anticipation
• Weaving information fabrics • Literature-based discovery • Deep Question Answering
14
A Simple Example
• Read this sentence: – Gene expression is caused by insoluble hormones
binding to a plasma membrane hormone receptor
• Topic Map recognizes: – Gene expression GeneExpression – insoluble hormones InsolubleHormone – plasma membrane hormone receptor
PlasmaMembraneReceptor
• Software agents transform: – is caused by Cause – binding to Binds
• Final semantic structure: • { {InsolubleHormone, Binds, PlasmaMembraneReceptor}, Cause, GeneExpression }
15
Introducing OpenSherlock
• OpenSherlock is: – A Topic Map for information resource identity and organization – A HyperMembrane information fabric structure – A society of agents system which can
• Read documents • Process information resources
– Maintain the topic map – Maintain the HyperMembrane – Build and maintain models – Perform discovery tasks – Answer questions
– Agents are coordinated by: • A blackboard system • A dynamic task-based agenda • Event propagation and handling
16
Observations 1
• A Topic Map is central to the key question, and therefore to a thesis entailed by this research
– It serves as a kind of memory for social processes
– It provides a robust platform for subject identity
– It can also serve as a repository for domain-specific vocabularies (ontologies, taxonomies, naming conventions,…)
17
Observations 2
• A Topic Map is necessary but not sufficient to support discovery, learning, or problem solving – It really only provides a powerful indexical structure related to
the key artifacts in any universe of discourse: • Actors • Their relations • Their states • Rules, laws, theories,…
• To model those key artifacts, other representation strategies are required – Conceptual Graphs – Qualitative Process Theory – Belief Networks – …
18
A Research Question
• What processes are available which, if performed while harvesting (reading) documents, can reduce the amount of processing required later during question answering? – The question entails
• Synthesis of ontology
• Co-reference resolution
• Re-representation during question lifting
• …
19
A Working Hypothesis
• Process
– Build and maintain a content-addressable memory of questions, claims, arguments, and evidence fields.
• We call that a HyperMembrane
– Note:
• Every text object passed into the system is processed by the same algorithms – Sentences harvested from text
– Questions and responses posed by humans
20
Key Concept: HyperMembrane
• HyperMembrane is a key concept in the working hypothesis that OpenSherlock seeks to explore and demonstrate – A growing graph as a collection of woven and
intersecting fabrics • constructed from normalized tuples (n-tuples) which
are designed to reduce the amount of NLP required to read documents
• such that intersections of fabrics occur where named entities in the graph of n-tuples are the same
– Inspired by Ted Nelson’s ZigZag Architecture
21
Machine Reading in OpenSherlock
• Goals: – Grow the topic map
• Topic Map then serves to support fabrication of higher-order knowledge structures
– Conceptual Graphs – Belief Networks – QP Theory Models – HyperMembrane – …
• Process Loop: – For a given document
• For every paragraph in that document – For every sentence in each paragraph
» Read the sentence
22
Sentence Reading
• First Step:
– Process sentence into word grams*
• Second Step:
– Where possible
• Transform word grams into n-tuples**
• n-tuples form the HyperMembrane
* A container of words, from 1 to 8 words per container
** A container of symbols based on words in word grams
23
Process Sentence into WordGrams
• Approach – Break sentence into word grams*
• WordGram objects are shared across sentences – Count of sentence identifiers associated with each object
serves as basis for probabilistic models
– Either • TopicMap recognizes terms
– Or • Sentence is parsed by Link-Grammar Parser**
• TopicMap learns from parse results
*http://en.wikipedia.org/wiki/W-shingling **http://www.link.cs.cmu.edu/link/ 24
Transform WordGrams to N-Tuples
• Normalized tuple (N-Tuple) – A structure where the subject, predicate, and object are normalized
• Nouns and verbs transformed – CO2, Carbon Dioxide, … CO2 – causes, is caused by, … cause
• Two sentence example – CO2 is a cause of climate change. – Climate change is caused by carbon dioxide. – Result:
» { CO2, cause, climate change }
– Normalization processes include general and domain specific lenses • Rule-based interpreters which detect structures
– Taxonomy – Causality – Biomedical – Geophysical – …
• Process models – Built and maintained while reading – Predict while reading – Anticipatory Reading
25
About N-Tuples
• An N-Tuple is a structured record of – Topics in the topic map – Those topics are harvested from text
• An N-Tuple takes the form: – { Subject, Predicate, Object } – Where
• Subject and/or Object can be one of: – A topic from the topic map – Another N-Tuple
• An N-Tuple is identified by the identities of the terms it contains – When thinking in terms of terms (words) read from documents, the identities
(numeric representations) of those terms form the identity of the N-Tuple object. • N-Tuples are content addressable
• Disambiguation of subjects is a topic mapping process – Learning means continuous refinement of subject identity – Ambiguities can also be solved through human intervention
26
N-Tuples as HyperMembrane
Tuples
{A, Bind, B}
{{A, Bind, B}, Cause, X}
{X, Bind, D}
{{X, Bind, D}, Cause, Y}
27
A B Bind
X
Y
D
Cause
Bind
Cause
Current State of OpenSherlock
ElasticSearch
Titan or
Blazegraph
Ontology Importer
Ontologies
PubMed Reader
PubMed Abstracts
HyperMembrane Engine
TellAsk
UMLS Importer
UMLS
28
Observations 3
• HyperMembrane is a reminding system – HyperMembrane is a record of federated human
conversation • Harvested from books, papers, and recorded
conversation
• Includes statistical properties of recorded utterances
– HyperMembrane records: • That which is common
• That which is novel – Possibly wrong
– Possibly game changing
29
TellAsk Interface
Conversation Tree User can click a
node to select as parent for any user response
Response Type Selectors. Selection required before
response.
User types here
Linear conversation flow
Entry Forms Selector List
Map starts a new conversation with entered topic
30
The Open Source Stack
• Persistence – ElasticSearch – Considering Titan – Considering Blazegraph (Bigdata™ RDF Store)
• Libraries – Many from Apache Foundation and others – LinkGrammarParser (Java version) – XML PullParser – Simple JSON Parser
• Tools – Eclipse
31
Current State of Development
• Aim to answer simple questions about casuality
– Current focus on biomedical domain
– Current focus on two lenses
• Taxonomy
• Casuality
– No Conceptual Graphs
– No Process Models
– No Probabilistic Models
33
Future Work
• Aim to complete an anticipatory system
– Process models for anticipation
– Conceptual graphs
– Probabilistic models
– More lenses
• Pluggable lenses
• Adaptive lenses
– More domains
34
Augmenting Social Sensemaking
1
2
3
Creating Ideas
Refining Connections
Connecting Ideas
Cancer patient
36
Key Context for Open Science
• A planet-wide, collaborative quest for Global Thrivability*.
– Issues include
• Sociological events – Health, epidemics, wars,…
• Geophysical events – Climate change, earthquakes, volcanoes, …
• Astrophysical events – Asteroids, our Sun. …
* Let’s call the quest: EarthMoonshot
38
Completed Representation
antioxidants kill
free radicals
Contraindicates
macrophages use free radicals to
kill bacteria
Bacterial Infection Antioxidants
Because
Appropriate For
Compromised Host
Let us co-create Cognitive Agents for Discovery [email protected]
OpenSherlock documents at: http://debategraph.org/OpenSherlock Code emerging at: https://github.com/opensherlock/ Slides online at http://slideshare.net/jackpark/
Acknowledgments: Bob Gleichauf David Alexander Price Arun Majumdar Robert S. Stephenson Mark Szpakowski Martin Radley Sherry Jones Alexander Wenzowski Ted Kahn Patrick Durusau
39