Upload
warren-floyd
View
214
Download
1
Tags:
Embed Size (px)
Citation preview
The Promise and Peril of RDF for Formalizing the Humanities
James Silas CreelSarah PotvinTexas A&M University Libraries
April 10, 2015Arlington, TexasTexas DH Conference
Talk Outline
• RDF Basics• Knowledge Representation in Computer Science
– The Rationalist tradition– A Critique
• Pragmatics of RDF– Need for human interpretation – the problem of readability
and understanding– Need for human composition – the problem of formalizing
humane expressions– Pitfalls of logical inference
• Successful Uses– Geonames, Pleiades, Pelagios– VIVO
Motivating Concerns
• RDF is extensible and flexible; it is not neutral – it involves commitment to:(1) certain way of structuring expressions, and (2) a community that has adopted this mode of
expression
• Theoretically, RDF can represent anything you want to express
• In practice, use of RDF without attention to conventions can render you incomprehensible
The humanist is better prepared than most to understand the situated and multivalent nature of expression
Community Situatedness
“Metadata is not simply a description of the information contained in a work or web page; the choice of a metadata scheme also signifies community membership. Every aspect of metadata-- from how it is obtained and verified to the expectations of how it will be used by humans or computer systems-- stems from the practices of a particular community.”-Marshall and Shipman, “Which Semantic Web?” 2003
RDF Basics – the W3C Web Stack
• URIs: Uniform Resource Identifiers - unique, unambiguous and persistent
• XML: eXtensible Markup Language - a markup language used for HTML, RDF, etc.
• RDF: Resource Description Framework – a set of conventions and syntaxes (including XML) for expressing information in triples and graphs
• RDFS: The RDF Schema – a set of RDF expressions that enable expression of classes and properties
• OWL: Web Ontology Language – an RDF extension for expressions of first-order-logic.
RDF Basics – Enabling the Semantic Web
• RDF enables machines to read and utilize webpages
– Unambiguous references for semantic search
– Automatic language translation– Question answering– Intelligent agents
RDF Basics – Triples
• Triples consist of a Subject, Predicate and Object, e.g.
Subject <http://jamescreel.net> Predicate <http://purl.org/dc/elements/1.1/author>Object <http://repository.tamu.edu/handle/1969.1/2313>
– Expresses that James Creel is the dc:author of the document
• Objects can also be literals, such as strings or integers
RDF Basics – RDF Schema
• Extends basic RDF with terms to used to characterize classes and properties
• Medium for defining new “ontologies”• Consists in the rdf and rdfs
namespaces documented at http://www.w3.org/TR/rdf-schema/
RDF Basics – SPARQL
• The query language for RDF• Starts with an optional list of prefixes• Queries consist of clauses of triples
with variables that can connect to other clauses
Traditions in Knowledge Representation
• Frames– Name an object, fill in its properties/relations
(“slots”) with other objects (or literals)
• Logic programming– FOL usually expressed as Horn clauses
• Functional programming– Recursive functions of variables
• Expert Systems– Use logic or functions to express a set of rules leading
from premises to conclusions– Interview an expert to get a bunch of rules about
their domain and encode them
Some Cautionary Examples in Knowledge Representation
• Fifth-Generation computing: A multi-million dollar effort that yielded good fundamental research in parallel computing, but was held back by concentration on logic-programming (PROLOG)
• Knowledge Navigator: Apple’s ambition for a semantic web agent
• Cyc: Since its start in 1984, the goal of formalizing “common sense” has not been realized. Recent efforts have concentrated on mapping its entities to Wikipedia.
The Phenomenological Critique of the Rationalist Tradition in Knowledge Representation
• In normal situations, we act without the need for logical modeling of the world.
• Logical reasoning is an exceptional type of reasoning that we appeal to relatively rarely, considering all the actions we take
Potential pitfalls in RDF
• Too heavyweight a solution when a relational database will suffice– Useful only if interoperability is intended
• English or other natural-language labels have different meanings for different folks, and none for computers
• Namespaces are not references to code, but merely shorthand. They do imply acceptance of a convention - the elements of a namespace are only significant to adopters
• The deeper and more expressive a formalism, the greater the barriers to adoption and use
Logical inferences in RDF
• Unrestricted logical inference, one of the potential strengths of RDF, is seldom employed – rather, programs reason heuristically or with canned queries.
• This is just as well, as formal logical expressions can unexpectedly entail contradiction or false inferences – E.g. owl:sameAs can produce falsehoods by
employing reification, modality, and Substitutivity
Some RDF Success Stories
• Geonames – www.geonames.org• Pleiades – www.pleiades.stoa.org• Pelagios - isaw.nyu.edu
/exhibitions/space/pelagios.html• VIVO? - www.vivoweb.org
Geonames
• An online gazetteer with a webservice and free data download
• ~ 8 million place names with focused metadata– Latitude and longitude– Feature types– Containing place– Alternate names– Links to Wikipedia articles
• Geonames’ data are available as RDF, and each geoname has a URI.
• This availability has afforded data linking, e.g. with DBPedia
• Under the hood, its data are in MySQL
Pleiades
• An online gazetteer of the ancient world• Extensive information exposed as RDF using
a number of schemas– Locations– Relationships to other places– Primary source citations– Time periods
• Under the hood, its data are in a Zope DB.
Pelagios
• A collaborative effort among 30 institutions to annotate historic documents with Pleiades-linked data
• Effort has concentrated on tools to assist annotators concentrating on particular collections
VIVO
• A Semantic Web tool for describing research, scholarship, people and institutions
• VIVO-ISF (Integrated Semantic Framework) is a separate but related project whose “ontology” underlines the VIVO app
• The development of this ontology has been fraught with controversy, and most adopting institutions utilize a small sampling of the defined properties and classes while being inclined to introduce their own
Conclusions
• Governance and collaboration facilitate wider adoption of ontologies– “ontologies” and schemata are meaningful only to adopters
• Domain circumscription facilitates expression– By circumscribing your domain, you can be parsimonious about
the ontologies, classes, and properties you employ.– By being parsimonious with ontological commitments, one
makes expression more efficient.– This efficiency of expression facilitates growth of your
knowledge base (i.e. graph)
• Growth leads to success in linked open data, as big knowledge bases are the big targets for linking