Upload
vitalai
View
1.499
Download
0
Embed Size (px)
DESCRIPTION
ISWC Presentation for Inform Technologies
Citation preview
1
More Meaning. Better Results.
1
Building the Inform Semantic Publishing Ecosystem:from Author to Audience
Marc HadfieldVP, Research & [email protected]
2
Marc Hadfield• Semantic Technology, Computer Science
• Inform Technologies (Head of R&D)‣Semantic Technologies applied to Content Analysis & Distribution
• Alitora Systems (Co-Founder / CTO)‣Life Science Semantic Technology, Research, Big Data Analytics, Semantic HPC
‣Life Science Natural Language Processing
• Columbia Genome Center‣NLP applied to Life Science Research Articles
• LCconnect (CTO)‣Letter-of-Credit Exchange
2
3
Semantics in Publishing…
3
• Ongoing Theme at ISWC 2010…‣NY Times
‣Facebook (OpenGraph)
‣Elsevier
‣BBC
4
What is Inform?
4
• Inform is a content enrichment solution designed to increase consumer engagement, page views and revenue.
• We provide a hosted Semantic Web Service for content publishers that:1. Reads your article before you publish it
2. Turns main topics and entities (people, places, companies, organizations) into links
3. Provides feeds of related web content when you publish it
• New Direction: Optimizing Content Distribution via Direct Channels• Web users moving away from destination web sites, but still want the destination web
site content.
• Companies utilizing Inform include:
Connecting your content
55
Audio, Video & Blogs from the Web
Articles from the Web
Content from Inform
Your Affiliates’ Content
Your Content
Affiliated Content
YourContent
LicensedContent
Google Street View Topic 0.90
Google Company 1.00
Ireland Place 0.70
Norway Place 0.70
South Africa Place 0.70
Sweden Place 0.70
Brian McClendon Person 0.80
Mountain View, California Place 0.60
Wi-Fi Topic 0.50
6
Related Content Widgets
6
7
Inform Topic Pages, Micro Sites
7
8
My Job: Building the Semantic Platform…
8
• “Silo”-ed Semantic Technology Semantic Web‣Aligned with Wikipedia, Leverage Linked Data for Mash-Ups
‣RDFa, SKOS, Semantic SEO
• Semantic / NLP Engine‣ Improve Features, Quality
• Semantic Data Infrastructure‣Scalable Infrastructure
• Semantic Data Analysis‣Algorithms (Topology of Graphs), Inference
‣ “PageRank” on semantic data
• Personalization, Usage Analysis
• Micro Sites‣Clusters of Topics, Generating Rich Content Experience
• Distributing to Social Platforms‣ i.e. Facebook
9
Inform: Author to Audience
9
10
Leverage Inform Taxonomy
10
1111
Author ‣ Content Creation Services
‣ Semantic Data Repository
‣ Semantic Data Analysis
‣ Content Selection Algorithms
‣ Webservices
‣ Content Distribution Services
Audience
Inside theSemanticSystem Architecture
12
Content Creation
12
• Article Creation Tool (ACT)‣Author Tools
‣Embed in CMS, Tumblr / Wordpress Plugin
• Publisher Portal‣Editorial Tool
‣Content Feeds
• Web Crawl
• Summarizer‣Create smart “blurbs” to advertise article
• LinkedData‣Freebase, Wikipedia, DBPedia, et cetera.
13
ACT Tool
13
14
ACT Tool
14
15
ACT Tool, Tumblr, Wordpress
15
16
Publisher Portal
16
17
Summarizer
17
18
Semantic Data Repository
18
• Data Master / Data Node‣Federated Semantic Data Managers
‣SPARQL Triplestore (scalable cluster)
‣Semantic Search
‣Search Indexes (Semi-Structured and Full-Text Search)
‣Lucene/Siren (Sindice)
‣Facets, Frequency Counts
‣Cache (In-Memory)
‣Blob Store (Voldemort)
‣Listener to Activity (Flume)
‣User Activity (clicks)
‣Content Activity (content updates)
‣Near Real-Time Trends, Analysis
‣Compute Algorithms (Stored Procedures in Groovy)
‣Long Term Content Archive (offline)
19
Semantic Data Analysis
19
• Natural Language Processing‣Rules & Machine Learning, Training
‣500K articles per day, 4,000 unique sites
‣Text Extraction, Section/Sentence Extraction
‣Tokenization, Part-of-Speech, Noun/Verb Phrases
‣Entity Extraction, Entity Normalization
‣Topic Extraction, Summarization, Clustering
• User Activity‣User Model (Personalization)
• Semantic Inference‣F-Logic, Multi-Domain
‣Linked Data Mash-Ups
• Semantic Graph Topology‣Entity / Property Importance Metrics, Ranking, “PageRank”
‣Which triples in LinkedData are interesting?
20
Content Selection Algorithms
20
• Model of User, Personalization‣Social Networks provide Context
• Semantic Analysis of Content
• Algorithms‣Maximize Relevancy / Relatedness (Meets Editorial Criteria)
‣Maximize Click-Through
‣Cute Kitten vs. Engagement Issue
‣Maximize Monetization
Goal: Content Exchange
21
Webservices
21
• REST‣Outputs RDF / JSON Data
• Natural Language Processing‣Article to Semantic MetaData
• Related Content‣ Inputs: Content, Personalization, Algorithm
‣Articles
‣Semantic Mash-Ups
‣Topics
‣Entities
• Semantic Query, Site Search
• Storage, Content Repository
22
Content Distribution Services
22
• Customer Destinations (Traditional Business)‣Deep Integration
• Publisher Widgets‣Levels of Lightweight Integration
‣Example: Related-Content-Widget in JavaScript
• Inform.com‣Topic Pages
• Micro Sites‣Several Thousand Owned-and-Operated Domains/Sites, Topic Driven
• Social Networks‣Facebook
Tools:
• Semantic SEO‣RDFa, SKOS
23
Semantic MetaData, RDFa
23
http://inspector.sindice.com
24
Facebook App
24
25
Using Facebook OpenGraph
25
Relevancy Algorithm:
Combine:•Trending / Popular Topics•Trending / Popular Articles•Personalization “Liked” Topics•Personalization “Liked” Articles•User Profiles (“Users like you…”)
26
Facebook “Liked” Topics
26
27
Facebook Article Stream
27
28
Inform: Author to Audience via Semantics
28
29
Thanks for your attention!
29
Questions?
Contact Information:
Marc Hadfield