15
Semantics & the Art of Metadata Management Avalon Consulting Kurt Cagle Principal Evangelist Avalon

Semantics & the Art of Metadata Management

Embed Size (px)

DESCRIPTION

This slide deck provides a high level overview of semantics and digital asset management systems, and shows how semantic technologies play a major part in making assets more discoverable and manageable

Citation preview

Page 1: Semantics & the Art of Metadata Management

Semantics & the Art of Metadata Management

Avalon Consulting

Kurt CaglePrincipal EvangelistAvalon Consulting, LLC

Page 2: Semantics & the Art of Metadata Management

Who We Are• Avalon Consulting, LLC

• Enterprise Web Presence, Enterprise Search, Big Data Solutions

• Founded 2003, headquartered in Plano, TX with consultants placed nationally

• Clients include media companies (Disney, Warner Bros.), publishers (McGraw Hill, Wolters Kluwer), government agencies (US Nat’l Archives, Library of Congress, US Patent & Trade Office, DoD, OECD)

• http://www.avalonconsult.com/DAMstrategies

• Kurt Cagle• Principal Evangelist, Semantic Technologies & Machine

Learning• Author of 18 books on web technologies• [email protected]

Page 3: Semantics & the Art of Metadata Management

The Problems with Metadata• Metadata Describes Things …

• Videos, Documents, Pictures, People, Companies, Music, Places, Concepts, Units, etc.

• But Search Looks for Words, Not Things• Proximity of terms, not proximity of concepts or items

• Context Is Important, Yet Often Lost• A “large” planet is large on a different scale than a “large”

molecule

• Controlled Vocabularies Aren’t • Lists of terms change, often rapidly

Page 4: Semantics & the Art of Metadata Management

Metadata in the Wild• Lack of discoverability• Potential for production or distribution errors• Siloization of metadata• Internationalization woes• Spiraling metadata capture costs• Others (outside entities) control your messaging

Page 5: Semantics & the Art of Metadata Management

Metadata Acquisition• Production Time Metadata

• Entered by the producers of data • Subject Matter Experts, highest quality, need consistent standards

• Curational Metadata• Entered by archivists• Time/personnel intensive, less context, more error prone

• Synthetic Curators• Image, Speech, Music Recognition, OCR, Document/Entity

Enrichment• Third party data providers and linked data• Fastest, requires context, but limited by data base• Semantics goes here

Any reasonable curation strategy should use all three approaches

Page 6: Semantics & the Art of Metadata Management

6

Semantic ModelingPeople, places, things and events can be linked and modeled

Event 2Event 1Jane Doe

John Smith

GeoRegion 1

GeoRegion 218

Dec

ABC CorpXYZ Corp

Aircraft AboutMem

ber O

f

Actor

Org

Loca

tion

Tim

e

Mem

ber O

f

Actor

Org

Loca

tion

Time

Subsidiary

Location

Abo

ut

Docu

ment

Document

Document 2

Document 1

Page 7: Semantics & the Art of Metadata Management

7

Semantic ModelingPeople, places, things and events can be linked and modeled

org:ABC_Corp org:XYZ_Corp

topic:Aircraft

event:Evt1

event:Evt2

person:JaneDoe

geoRegion:Eurasia

geoRegion:Americas

time:2014-11-24

time:2014-12-18

pers

on:m

em

berO

f

event:organization

event:actor

event:location

person:JohnSmithevent:about

event:startTime

org:subsidiary

event:

org

aniz

ati

on

event:about

event:actor

event:

start

Tim

e

event:location

event:location

Page 8: Semantics & the Art of Metadata Management

Semantic ModelingPeople, places, things and events can be linked and modeled

org:ABC_Corp rdf:type class:Organization; org:name “ABC Corporation”.org:XYZ_Corp rdf:type class:Organization; org:name “XYZ Corporation”. org:subsidiary org:ABC_Corp.person:JaneDoe rdf:type class:Person; person:name “Jane Doe”; person:org org:ABC_Corp.person:JohnSmith rdf:type class:Person; person:name “John Smith”; person:org org:XYZ_Corp.geoRegion:Eurasia rdf:type class:GeoRegion.geoRegion:Americas rdf:type class:GeoRegion.“2014-12-21”^^xs:date.

topic:Aircraft rdf:type class:Project; topic:name “Aircraft”.event:Evt1 rdf:type class:Event; event:agent person:JaneDoe; event:about topic:Aircraft ; event:location geoRegion:Eurasia, geoRegion:Americas; event:org org:ABC_Corp; event:label “Training Module”; event:startDate “2014-11-24”^^xs:date. event:endDate “2014-11-26”^^xs:date.event:Evt2 rdf:type class:Event; event:agent person:JohnSmith; event:about topic:Aircraft ; event:location geoRegion:Americas; event:label “Surveillance Component”; event:org org:XYZ_Corp; event:startDate “2014-12-18”^^xs:date. event:endDate

Page 9: Semantics & the Art of Metadata Management

9

InferencingUsing RDF to Find Relationships and Surface New Information

Select ?eventLabel ?orgName ?actorName ?startDate ?endDate where { ?project rdf:type class:Project; project:name $projectName. ?event event:project ?project; event:actor ?actor;

event:label ?eventLabel; event:startDate ?startDateISO; event:endDate ?endDateISO. ?actor actor:name ?actorName; actor:memberOf ?org; ?org org:name ?orgName. bind (format-date($startDateISO, "[MNn] [D], [Y]“) as ?startDate) bind (format-date($endDateISO, "[MNn] [D], [Y]“) as ?endDate) } order by ?startDateISO ?endDateISO

For a given project, identify the names of those actors and their associated organizations who were involved in an event focused on that project, along with when the event started and ended, sorted by orgName, actor name, start and end dates respectively.

Page 10: Semantics & the Art of Metadata Management

10

InferencingUsing RDF to Find Relationships and Surface New Information

For a given project, identify the names of those actors and their associated organizations who were involved in an event focused on that project, along with when the event started and ended, sorted by orgName, actor name, start and end dates respectively.

eventLabel orgName actorName startDate endDate

Training Module 5

ABC Corp. Jane Doe Nov 24, 2014 Dec 17, 2014

Surveillance Comp.

XYZ, Inc. John Smith Dec 18, 2014 Mar 12, 2015

Nav Sys 1 ABC Corp. Jane Doe Dec 19, 2014 Feb 9, 2015

Gyro Sys 3 ABC Corp. Jane Doe Dec 19, 2014 Mar 1, 2015

AutoSensor 2 XYZ, Inc Steve Deere Dec 21, 2014 Apr 7, 2015

Left Fore Aileron XYZ, Inc. John Smith Mar 14,2015 Jun 1, 2015

Page 11: Semantics & the Art of Metadata Management

Semantic Data• Assets and concepts are globally identified• Discoverable Models• Open world Assumption

• Incompleteness of knowledge is a given

• Queries are Web Aware, Distributed & Federated• Standardized, Simple RESTful Interfaces• Format Agnostic – XML, JSON, Turtle, Yours• Data models are refinable• Plays well with Hadoop, NoSQL, Big Data Solutions

Page 12: Semantics & the Art of Metadata Management

Search + Semantics• Search identifies potential related terms• Semantics build context from search

results• With context, relationships can be followed• With related items, relevance extends

beyond terminology• Semantics builds navigational structures,

and binds atomic content together• Enables more effective media search

Page 13: Semantics & the Art of Metadata Management

Who’s Becoming Semantic?

Page 14: Semantics & the Art of Metadata Management

Metadata ManagedSemantics • establishes a metadata management

framework,• simplifies automated ingest,• constrains manual metadata capture,• increases search relevance,• enables “atomization” of content,• learns over time,• encourages data interchange.

Page 15: Semantics & the Art of Metadata Management

Questions?