Upload
nicholas-mcdowell
View
221
Download
3
Tags:
Embed Size (px)
Citation preview
Project Goals Assist analyst in everyday work Knowledge Authoring Tools to assist in:
Research for reports Produce reports Consume reports Share reports
Our solution: Semantic Web Templates
Semantic Web Templates Knowledge Representation,
Semantics are key for information exchange
Creation, maintenance of knowledge must be transparent
Automate extraction of knowledge Enhance knowledge retrieval
methods
Semantic Web Templates Similar to MS Word Templates
Different templates for different tasks Word templates can have restrictions on
text Very primitive, such as length of text Simplistic patterns such as “phone number” No concepts such as “color” or “country”
One template, many documents HTML templates are very common today
Many web sites use SQL database as back end, template + SQL HTML
Semantic Web Templates An HTML file with additional tags Tags specify:
Where particular knowledge is stated What kind of knowledge it is Where it came from, if applicable References to an entity or relation Repetitive regions of text
Goal: Assist Research Unstructured Extraction
Sort through buckets of data to find gold
Entity recognition Relation recognition
Semistructured Extraction Utilize repetitive patterns within a page Use similar pages to extract more data Robust despite changing pages, data
Unstructured Extraction Natural language processing News feeds Indexing, storage, retrieval Plugin architecture
Web Services Our system, collaboration with IBM via NIMD
Rover news crawler Political news articles from Yahoo! 22,000 articles, ~8500 concepts, ~1000 relations
Used in authoring tools
Unstructured Extraction Pattern based system
Leverage “hints” for the reader in news articles British Prime Minister Tony Blair <type Country><subClassOf Politician> <unknown name> “Tony Blair” is a Prime Minister who represents the
Country “England”. System runs daily on Yahoo political news Highlights known terms in green Highlights new terms in red Used to create search index, maintain KB Demo
Semi-structured Extraction Extract, produce knowledge Initial model is Domain Authorities
Enhance KB with ground facts Strong for relations and breadth of data Leverages work of others Makes use of SQL databases
Future work is wide-scale web of trust
Semi-structured Extraction Site Registry
By description and property CIA World Fact Book has data about
items which are of type <Country> CIA World Fact Book has properties
<population>, <hasNeighbor>, <hasMembership>, etc.
Demo
Semi-structured Extraction Publishing
Human editing good for high-level concepts
Automated techniques good for relations, ground level facts, and massive repetition
Rover web crawler Template construction is currently
manual With critical mass of data, templates
could be discovered.
Enhanced Document Retrieval Enhanced document retrieval
Search based on concept Find articles about… Membership: Scottie Pippen Trailblazers Membership: Osama bin Laden al-Qaeda Subgroups:
Ramadan Shallah Islamic Jihad al-Qaeda
Semantic search
Enhanced Document Retrieval Document Augmentation
Sidebar acts as glossary as you read Pre-fetch data user is likely to want Adapt to user preferences, activities Deeper understanding for user, gets
answers to questions raised while reading
Search Augmentation Google assumes users only want
documents Provide answers along with documents Use query term denotation to more
closely target results “Browns Ferry” is a garden park “Browns Ferry” is a nuclear power plant Automates what people do with IR systems
Append hints about the type of term being sought
Basic Question Answering Automated techniques for ground
facts Use reasoners for higher-level facts
Tie in with KSL AQUAINT work Feedback, direction from user Structure of knowledge allows
simple form of question answering
Basic Question Answering Multiple views into data Browse interface
Ugly, but complete view Activity-based knowledge
presentation Search, document augmentation Future work accept user feedback,
customization, preferred sources
Basic Question Answering Query by example
Users create many similar documents These are targeted to an activity Use past work to speed present work User creates and templates which
present data they find interesting in a way they find convenient
Goal: Produce Reports Most reports are made with Office
Word processor, spreadsheet Enhance with semantic awareness Provide seamless access to
knowledge Transparent maintenance, creation
Low overhead of operation Avoid centralized approach Contrast with relational database
Word Processing Creation of new data
Semantic scan Like spell check or grammar check Automatically identifies referenced entities Learns new entities, relations between
entities Annotation of text
User manually adjusts system User adds new data
System gets smarter over time
Word Processing Create data via entry into templates Create new templates
For others For personal use
Extend templates with new entry areas Enhance analyst’s view
Semantic Search, Document Augmentation Sidebar boxes are templates too
Spreadsheets Spreadsheets are key tools in
analysis Tabular format, UI are both intuitive Sorting, basic math functions We add semantics:
New formula type: “Get Data” New formula type: “Put Data”
Summarization, new views
Spreadsheets Example scenario
Suppose SARS was found to affect Asian-Americans more than others?
Analyst wants to determine, based on that, which states are most at risk
Knowledge from Census tells us Asian-American population as a percentage
Goal: Consume Reports Verify others’ data against yours Incorporate others’ results into your
knowledge base, track sources Maintain data
Change notification Document updates with new data
Versioning of documents, data
Goal: Share Reports Easily exchangable via e-mail Truth maintenance techniques Multiple views into data Leverage domain expertise
The missile guy has a KB, … Collaboration, trust levels
Colleagues disagree, sources are unreliable