Upload
austin-gallagher
View
214
Download
0
Embed Size (px)
Citation preview
Synthetic Information Architecture
Semantic Web Technology:
Leading the Migration Path
from Static / Library
To Dynamic / Network
Architecture
Synthetic Information Architecture
Information Architecture can be defined as:
1. Document Based Static/Library:The current paradigm is Libraries; from stone tablets to digital databases.Documents are manually collected, read, classification assigned, and stored.Users find documents by static classification system and keyword tools.Users must retrieve entire documents to find knowledge inside the document.Users must provide their own context and analysis of the document.But:New documents do not automatically update the classification system or database.Classification system can be changed by manually re-classifying all old documents.Users must take the initiative to query the Library; and manually process the results.
Knowledge Based Dynamic/NetworkThe next generation is Synthetic Information Networks based on knowledge.Documents are automatically captured based on end user collection priorities.Documents are automatically pre-processed to extract the all the concepts and context.Documents are automatically classified and stored based on latest concept schema.Users automatically define / refine their topic priorities to “teach” the System their needs.The system automatically provides knowledge: summaries, abstracts, analysis and translations.
The next generation operates in parallel and above static libraries and keyword tools.
Synthetic Information Architecture
Information Architecture Trends:Commercial and government user communities will soon be forced to migrate:
1) Tidal Wave of Information Shifts PowerThe tidal wave of raw data will drive the expansion of Semantic Web architecture and applications.
2) Migration to XMLand RDF StandardsApplications will follow Microsoft’s migration to XML/RDF standards for document authoring and exchange.
3) Universal Internet Web PortalsInternet web portals based on Semantic Web applications will become the new central user application.
4) Parallel Legacy Database IntegrationLegacy databases will be extracted into parallel Semantic Web databases with integrated concept and context.
5) Global and Language ExpansionInformation sources and users will expand globally and drive no-loss translation between language domains.
6) Network Access and DistributionSemantic Web architecture will link data and users between servers, desktop, laptop, PDA, and cell phones.
7) Machine Transactions and Network CapacityMachine Semantic transactions will grow exponentially; and increases network investment, capacity and services.
1981 déjà vu again; The Technology Outcome is known; the Leadership is not yet known.
Synthetic Information Architecture
Name
Address
Comments
“According to sources who have reported reliably in the past....”
1. Paper forms include both Headings (structure) and Descriptions (content)
“Name” has legal, spelling and other variations
“Address” has time, accuracy and other variations
“Comments” is completely unstructured data;And may refer to and/or conflict with other dataIn one document or documents, or the database.
The Data look “hard” and “fixed” but are very are “soft” and “fluid”.
Synthetic Information Architecture
Name
Address
Comments
Name
Address
Comments
2. Electronic systems duplicate the same hard (structure) and soft (content).
The Data are copied from paper to bits; but no value is added.
Synthetic Information Architecture
Name1
3. Relational databases define categories and store data in a static library archive.
Name
Address
Comments
Name1
Name5
Name4
Name2
Name3
Electronic search functions are the same as human functions. Only faster.
“List” “Print” “Compare” “Search”
Name1
Address1
Comment1
Name2 Name1
Address1
Synthetic Information Architecture
Most systems today use simple keyword tools to search in static libraries:
Name
Address
Comments
Single data domains with simple keyword tools are rapidly obsolete.
Google and other keyword tools are better than nothing;But, fail the efficiency and performance confidence tests:
a) Scale factors: More data is more hits: “You have 100,000 hits…”
b) False Positives: You will only read the first 3-10; most hits will not be relevant to your needs.
c) False Negatives: You will miss valid sources; if a word, term or user community is different.
d) Raw Sources: A perfect hit is a raw document; not the summary, analysis, context or expertise.
Synthetic Information Architecture
Adding Multiple Databases and Tools quickly reduces quality and efficiency.
Name
Address
Comments
Search multiple sources with multiple tools is slower and less accurate.
Synthetic Information Architecture
Name
Address
Comments
Adding new databases creates multiple conflicting terms and classifications.
More new data now complicates rather than improves the system.
Synthetic Information Architecture
Name
Address
Comments
Patching the new linkages in new databases is an exponential $ problem.
Infinite interface patches cannot integrate all new data in all new databases.
Note: This is the $500M Trilogy Program
Synthetic Information Architecture
StructuredRelationalDatabase
UnstructuredText Data
Solution:Capture all data in multiple internal and inter-agency legacy databases, and automatically integrate new data, new classifications and new definitions:
First Step is extract and process data; and store in one Semantic Web.
Semantic Processor
Taxonomies and Context
Sources
Synthetic Information Architecture
Semantic Web technology operates above and in parallel to legacy systems;Preserving the legacy system, and dramatically enhancing the performance.
Synthetic Information Network
Second Step is a network with access to ALL sources in ONE Portal.
LegacyTools &Legacy Database
SemanticWeb Portal&Local Database
Semantic Database
Legacy Databases
Other Semantic Web Sources
Semantic Context
Semantic Processor
Synthetic Information Architecture
The Solution is a Semantic Web Architecture and a Dynamic Process:
Synthetic Information Architecture
Synthetic Information architecture supports all data sources and user applications:
Synthetic Information Architecture
A Semantic Web Database: Integrating 3 Unique Sources for “Duke”
Subject Text Semantic network
Edward Kennedy Ellington was born into the world
on April 29, 1899. Duke’s parents Daisy Kennedy
Ellington and James Edward Ellington served as
ideal role models for young Duke and taught him
everything from table manners to an understanding
of the emotional power of music.
P arents
E llington
C harlieP arker
B illyH oliday
C om poser
JazzD uke
N ew York
Table m anners
The John Wayne look-alike and impersonator, Ermal
Walden Williamson, has dedicated his career to
keeping the image of John ‘the Duke’ Wayne alive
and appearing around the world as John Wayne.
D ukeJohnW ayne
Erm al WaldenWilliamson
K atherineH epburn
C ow boy M ovie
Duke Nukem: The Manhattan Project is an all-new
PC adventure starring gaming’s king of action. Set in
New York City, Duke battles his way across towering
skyscraper rooftops and through gritty subway
tunnels on the hunt for power-hungry villain, Mech
Morphix.
G am e
N ukem
R ole p lay
D uke
D oom
W olffenstein
A dventure
Sky Scraper
Subw ay
Villan
Figure 21- Sample analysis of three documents
Synthetic Information Architecture
A Semantic Web Database: Integrating 3 Unique Sources for “Duke”
Parents
Ellington
CharlieParker
BillyHoliday
Com poser
Jazz
New York
Table m anners
DukeJohnWayne
Ermal WaldenWilliamson
KatherineHepburn
Cowboy Movie
Gam e
Nukem
Role playDoom
W olffenstein
Adventure
Sky Scraper
Subway
Villan
Synthetic Information Architecture
Semantic Web Example: “Java” Database Built from unstructured text data:• Java occupies the central island in the Indonesia archipelago.
• Java is a computer language is controlled by Sun Microsystems.
• Java is often used as the slang term for coffee beverages.
Semantic WebDefines the relationships
between the 3 “Java”
concepts
Synthetic Information Architecture
Separating unstructured text into Knowledge Clusters or Ontologies• Java = geography cluster
• Java = computer language cluster
• Java = beverage cluster
Automatically createsKnowledge Clusters
thatdefine unique concepts and
attributes
Synthetic Information Architecture
The Semantic Processor automatically defines the Specific Relationships:
Identifying the Temporal, Logical, and Topical Contents within and between Clusters.
Topical Concepts
Logical Relationship
Temporal Relationship
Logical Relationship
Synthetic Information Architecture
Semantic Processor Automatically builds a rich Ontology for each Concept:
New data adds to the context of old data; and enhances the value of all data.
Java Ontology
Coffee Ontology
Computer Ontology
Synthetic Information Architecture
Semantic Processor Automatically expands these Ontologies with new data:
A Semantic Processor has automatically:
a. Defined the source language and loaded the correct language processor.
b. Analyzed the document, and defined the Logical relationships between concepts.
c. Extracted the data in RDF format; andbuilt a rich ontology for each concept.
d. Stored the information in Semantic WebDatabase to permit rapid modifications.
e. Created the classification categories toStore and retrieve the document.
e. Made all the information available to anySemantic web application program and user.
f. Retained the original document and linked back to it as a reference.
Synthetic Information Architecture
Information is extracted automatically in RDF format:
The external header and internal content is integrated and stored in a common RDF format.
Synthetic Information Architecture
All information is available through a Single Semantic Web Portal Interface:User Configuration-All Sources-Original Languages-All File Formats-Security Levels
Interactive Dialogue- Related Terms- Similar Terms- Expanded Phrases
Concept Groups - Dynamic categories from user queries
Dynamic Summary- Abstract from a document or a Concept Group.
Document Classification-Author-Document Date-Relevance Ranking
Original Documents-Category & Title-Brief Summary-File Format-Data Source-Language-Related Documents
Searching multiple database with multiple tools and conflicting results is solved
Synthetic Information Architecture
Synthetic Information Architecture supports powerful New Applications:
Search: Find any document in any on-line or intra-net database
Capture: Convert documents from any format:(Word, PDF, HTML, etc)
Synthesize: Build and maintain a real time semantic web database.
Summarize: Automatic summaries in any size, format or focus.
Analyze: Find and summarize sources and answers to questions.
Reference: List documents and sources that support user questions.
Report: Distribute sources, summaries, analysis, and experts.
Alert: Automatic customized search, analysis and reporting
Experts: Identify and qualify all experts and organization sources.
Internet: Search the Internet or Intranet sources automatically.
Single Portal: Access all information sources from a single portal.
Automate: search, extraction, summary, analysis and reporting
Advanced functions are not possible with static library and keyword tools
Synthetic Information Architecture
Conclusions:
1. Synthetic Information architecture and applications will follow IT history:
IE: Intel/Microsoft architecture with Visi-calc spread sheet applications in 1981-83.
2. Dynamic Networks will rapidly grow on top of legacy static library systems.
No need to stop current systems; only interface through RDF/XML and enhance them.
3. Scarce/expensive professional users will drive the architecture migration.
Greatest pain/gain is the advanced professional users; not management or IT staff.
4. Architecture change will grow quickly from the Outside In / Bottom Up:
The scale of cost, risk, schedule, training is within small office skills/budgets.
5. Established IT vendors will wait for large client RFI/RFPs; and miss the boat.
Architecture / Applications dominance is driven by forward design wins; not purchasing.
6. The Limiting Factor is Technical Leadership; not Procurement Funding.
The installed costs are so low, and payback so rapid in professional time and quality.