Upload
mary-bryant
View
212
Download
0
Embed Size (px)
Citation preview
+
Page 1
New Technologies in the OffingFuture/Current State of
Information Sharing
Lucian Russell, PhD
Expert Reasoning & Decisions LLC
Architecture Plus Seminar
March 6th, 2007
+
Page 2
Lucian Russell, PhD – Quickie CV
• Education– First Computer Course Columbia University Spring 1961
– Harvard College BA Mathematics 1965
– New York University Courant Institute MS Mathematics Masters Degree 1969 All PhD Course Work Complete 1974
– George Mason University PhD Information Technology 1996
• Key Employers– Bell Telephone Labs Member of Technical Staff
– Computer Sciences Corporation Chief Scientist, Technology Center
– Argonne National Laboratory Director, Adv Comp App Center
• Key Specialties– Database Technology
– Information Retrieval (running for ACM SIGIR Secretary)
– Uncertainty Reasoning
• Recent Activity– Liaison between a government agency and the DNI/DTO (formerly ARDA)
+
Page 3
Why there is a need for new Technologies for Infoglut?
Computers do not talk “people”
language
Where’s the data I need!!!!
--------- Current Technology ---------→
Search Manually Search
Automatically
+
Page 4
There is too much for current technology to handle!
1,820,000 Web documents about EPA & Water & Data
NASA Global Climate Change Master Directory
Index of 20 Petabytes of Government
Data
+
Page 5
The challenge
• The human being uses Language• The computer is good at mathematics• Mathematics is an instantiation of logic (one axiom & definitions)• We need to bring logic and language together so that human
dependent methods can be replaced by automated ones
1930s
The Logical Positivists tried first and gave up in the late
1960s
1970s
The Computer SW
Community invented Data
Modeling: throw away unneeded
words
1980s
Academic AI researchers
promised NLP and expert
systems and failed: only
brittle & limited systems
1990s
Academic AI researchers
promised Intelligent
Agents using Ontologies and
failed
+
Page 6
The Data Reference Model 2.0
+
Page 7
Data Context: The best of the 1700s
TopicData Asset
Word 1
Taxonomy 1
- Directory- File of Documents- Document Database- File of Data …
…
…
…
+
Page 8
Data Description: Peter Chen’s 1976 ERA Diagrams
Word 1
Data Schema
Entity Attribute
Relationship Data Type
is constrained by
Participates-in
Relates
contains
Document
refers to
+
Page 9
We would like to do better ---- but HOW!
+
Page 10
There are new relevant R&D advances
• WHO?– About 100 R&D projects in data related technologies were funded over the last 6
years by the Intelligence Community's Advanced Research and Development Activity (ARDA).
These are unclassified but mostly not publicly available – FOUO. The R&D teams are from the best of the best universities and private companies (e.g.
IBM Watson, SRI, PARC) They have results!!!!!!!!!!!!! In the Information Exploitation program (Info-X) they are on view all next week in Dallas –
for government personnel and invitees.
• WHAT? (1) Major Language Advances WordNet has become a precise information artifact
– 117,000 disambiguated meanings of English words. – Similar efforts underway with other langauges.
The human language of Time is now understandable with a new markup language Opinions are now detectable in sentences Logical relationships are now detectable in sentences (40+)
• WHAT? (2) Major Breakthrough in Knowledge Representation – IKRIS
• Net Result: A new age in information sharing is upon us! NOW!
+
Page 11
What’s different?
• DRM 2.0 terminology and resultant practices reflect the old technology– In the old DRM logic was isolated in Data Descriptions and linguistic constructs
were in Data Contexts.– Short cuts are the norm – vocabulary in Data Schemas is bent to look like nouns
and verbs and verbs constructs are misused. – Descriptions could contain metadata, but there is no necessary linking of
concepts in taxonomies to data descriptions
• What’s possible now thanks to the R&D community– We now can use English words for concepts and they have a precise meaning– We can actually represent processes – meronymic networks of verbs (not
taxonomies) – so we can really have SOAs that computers can reason about– We can mine textual documents that describe databases to generate precise sets
of concepts for later re-use.– We have shown that all knowledge representations are interoperable to the
extent of their representational power (e.g. OWL DL is still restrictive).– We have representations for 2nd-order logic and for non-monotonic logic
• We can build accurate real world representations
+
Page 12
What does this mean for Data Architecture?
• The Bad News– Everything your data architects have done for the last 40 years has destroyed
very valuable information because we had no way of using it – the database version of the Year 2K problem.
What they were taught in school was the best that could be done but was really wrong A lot of money was wasted. Continuing to do things the same old ways now is a waste of money!
• The Good News: Remediation will help!– Previously nobody wanted to spend the time and money accurately documenting
databases and data collections because the documentation could only be read by human beings and they
are smarter than computers and therefore could figure out how to use the data without a lot without extensive documentation
– Spend the money and do the work right and it will be usable later Because of new language tools and the power of IKRIS you can mine the artifacts you
create to generate real, extensive knowledge bases.
• Repeat message: Spend the money to do it right and it will pay off!
+
Page 13
What makes this possible?
• Reference: The Semantic Interoperability Community of Practice (SICoP) Wiki contains the detailed information. We are finishing up the Feb 6th meeting’s artifacts now but several presentations are already there.
– Speakers from the R&D community at that meeting: Dr. Fellbaum of the WordNet program, Dr. Prange of LCC – former ARDA program director Dr. Witbrock from CYCORP – which is now ready for prime time
• See the Global Change Master Directory as the template of how to weave linguistic terms and data descriptions together: it works and it is an 18 year collaboration of many Civilian agencies (Lola Olson is NASA’s program director and her presentation is there)
• What is new: real world vs. syllogistic reasoning– Syllogistic reasoning is from Aristotle: all crows are black etc.– What is needed is real-world reasoning which requires precise time semantics
and contingent truths to create contexts for real world assertions to be proven. – Heretofore there was no unifying computer-processable representation for all of
these types of logic but now there is.
+
Page 14
End of Talk: Personal Comments
• The comments made in this talk are inferences from R&D work which I have followed since Spring 2004. Not all of it has been public though none is classified, but all can be referenced.
• I too preached the validity of the methods and techniques I have just described as defective and obsolete because at the time they were the best we had, We now have better.
• I have been invited to attend the Info-X meeting next week which features the latest R&D results described here. Some are already productized by companies and some are University prototypes
• Expert Reasoning & Decisions LLC (a.k.a. XRAD) is the business name I use in consulting. It is there for convenience, but is intended for limited use.
– For general consulting on technology I will be working with Dr. Brand Niemann of the SICoP and his NSF sponsor on extending the State of the Art and its availability in government.
– For special projects that will require my time I can use XRAD for doing work.
• General Plans– Work on a book about how the new technology enables the Web 3.0
– Publish academic articles in scholarly venues
– Advance the State of the Art