22
The Promise and Peril of RDF for Formalizing the Humanities James Silas Creel Sarah Potvin Texas A&M University Libraries April 10, 2015 Arlington, Texas Texas DH Conference

The Promise and Peril of RDF for Formalizing the Humanities James Silas Creel Sarah Potvin Texas A&M University Libraries April 10, 2015 Arlington, Texas

Embed Size (px)

Citation preview

Page 1: The Promise and Peril of RDF for Formalizing the Humanities James Silas Creel Sarah Potvin Texas A&M University Libraries April 10, 2015 Arlington, Texas

The Promise and Peril of RDF for Formalizing the Humanities

James Silas CreelSarah PotvinTexas A&M University Libraries

April 10, 2015Arlington, TexasTexas DH Conference

Page 2: The Promise and Peril of RDF for Formalizing the Humanities James Silas Creel Sarah Potvin Texas A&M University Libraries April 10, 2015 Arlington, Texas

Talk Outline

• RDF Basics• Knowledge Representation in Computer Science

– The Rationalist tradition– A Critique

• Pragmatics of RDF– Need for human interpretation – the problem of readability

and understanding– Need for human composition – the problem of formalizing

humane expressions– Pitfalls of logical inference

• Successful Uses– Geonames, Pleiades, Pelagios– VIVO

Page 3: The Promise and Peril of RDF for Formalizing the Humanities James Silas Creel Sarah Potvin Texas A&M University Libraries April 10, 2015 Arlington, Texas
Page 4: The Promise and Peril of RDF for Formalizing the Humanities James Silas Creel Sarah Potvin Texas A&M University Libraries April 10, 2015 Arlington, Texas

Motivating Concerns

• RDF is extensible and flexible; it is not neutral – it involves commitment to:(1) certain way of structuring expressions, and (2) a community that has adopted this mode of

expression

• Theoretically, RDF can represent anything you want to express

• In practice, use of RDF without attention to conventions can render you incomprehensible

Page 5: The Promise and Peril of RDF for Formalizing the Humanities James Silas Creel Sarah Potvin Texas A&M University Libraries April 10, 2015 Arlington, Texas

The humanist is better prepared than most to understand the situated and multivalent nature of expression

Page 6: The Promise and Peril of RDF for Formalizing the Humanities James Silas Creel Sarah Potvin Texas A&M University Libraries April 10, 2015 Arlington, Texas

Community Situatedness

“Metadata is not simply a description of the information contained in a work or web page; the choice of a metadata scheme also signifies community membership. Every aspect of metadata-- from how it is obtained and verified to the expectations of how it will be used by humans or computer systems-- stems from the practices of a particular community.”-Marshall and Shipman, “Which Semantic Web?” 2003

Page 7: The Promise and Peril of RDF for Formalizing the Humanities James Silas Creel Sarah Potvin Texas A&M University Libraries April 10, 2015 Arlington, Texas

RDF Basics – the W3C Web Stack

• URIs: Uniform Resource Identifiers - unique, unambiguous and persistent

• XML: eXtensible Markup Language - a markup language used for HTML, RDF, etc.

• RDF: Resource Description Framework – a set of conventions and syntaxes (including XML) for expressing information in triples and graphs

• RDFS: The RDF Schema – a set of RDF expressions that enable expression of classes and properties

• OWL: Web Ontology Language – an RDF extension for expressions of first-order-logic.

Page 8: The Promise and Peril of RDF for Formalizing the Humanities James Silas Creel Sarah Potvin Texas A&M University Libraries April 10, 2015 Arlington, Texas

RDF Basics – Enabling the Semantic Web

• RDF enables machines to read and utilize webpages

– Unambiguous references for semantic search

– Automatic language translation– Question answering– Intelligent agents

Page 9: The Promise and Peril of RDF for Formalizing the Humanities James Silas Creel Sarah Potvin Texas A&M University Libraries April 10, 2015 Arlington, Texas

RDF Basics – Triples

• Triples consist of a Subject, Predicate and Object, e.g.

Subject <http://jamescreel.net> Predicate <http://purl.org/dc/elements/1.1/author>Object <http://repository.tamu.edu/handle/1969.1/2313>

– Expresses that James Creel is the dc:author of the document

• Objects can also be literals, such as strings or integers

Page 10: The Promise and Peril of RDF for Formalizing the Humanities James Silas Creel Sarah Potvin Texas A&M University Libraries April 10, 2015 Arlington, Texas

RDF Basics – RDF Schema

• Extends basic RDF with terms to used to characterize classes and properties

• Medium for defining new “ontologies”• Consists in the rdf and rdfs

namespaces documented at http://www.w3.org/TR/rdf-schema/

Page 11: The Promise and Peril of RDF for Formalizing the Humanities James Silas Creel Sarah Potvin Texas A&M University Libraries April 10, 2015 Arlington, Texas

RDF Basics – SPARQL

• The query language for RDF• Starts with an optional list of prefixes• Queries consist of clauses of triples

with variables that can connect to other clauses

Page 12: The Promise and Peril of RDF for Formalizing the Humanities James Silas Creel Sarah Potvin Texas A&M University Libraries April 10, 2015 Arlington, Texas

Traditions in Knowledge Representation

• Frames– Name an object, fill in its properties/relations

(“slots”) with other objects (or literals)

• Logic programming– FOL usually expressed as Horn clauses

• Functional programming– Recursive functions of variables

• Expert Systems– Use logic or functions to express a set of rules leading

from premises to conclusions– Interview an expert to get a bunch of rules about

their domain and encode them

Page 13: The Promise and Peril of RDF for Formalizing the Humanities James Silas Creel Sarah Potvin Texas A&M University Libraries April 10, 2015 Arlington, Texas

Some Cautionary Examples in Knowledge Representation

• Fifth-Generation computing: A multi-million dollar effort that yielded good fundamental research in parallel computing, but was held back by concentration on logic-programming (PROLOG)

• Knowledge Navigator: Apple’s ambition for a semantic web agent

• Cyc: Since its start in 1984, the goal of formalizing “common sense” has not been realized. Recent efforts have concentrated on mapping its entities to Wikipedia.

Page 14: The Promise and Peril of RDF for Formalizing the Humanities James Silas Creel Sarah Potvin Texas A&M University Libraries April 10, 2015 Arlington, Texas

The Phenomenological Critique of the Rationalist Tradition in Knowledge Representation

• In normal situations, we act without the need for logical modeling of the world.

• Logical reasoning is an exceptional type of reasoning that we appeal to relatively rarely, considering all the actions we take

Page 15: The Promise and Peril of RDF for Formalizing the Humanities James Silas Creel Sarah Potvin Texas A&M University Libraries April 10, 2015 Arlington, Texas

Potential pitfalls in RDF

• Too heavyweight a solution when a relational database will suffice– Useful only if interoperability is intended

• English or other natural-language labels have different meanings for different folks, and none for computers

• Namespaces are not references to code, but merely shorthand. They do imply acceptance of a convention - the elements of a namespace are only significant to adopters

• The deeper and more expressive a formalism, the greater the barriers to adoption and use

Page 16: The Promise and Peril of RDF for Formalizing the Humanities James Silas Creel Sarah Potvin Texas A&M University Libraries April 10, 2015 Arlington, Texas

Logical inferences in RDF

• Unrestricted logical inference, one of the potential strengths of RDF, is seldom employed – rather, programs reason heuristically or with canned queries.

• This is just as well, as formal logical expressions can unexpectedly entail contradiction or false inferences – E.g. owl:sameAs can produce falsehoods by

employing reification, modality, and Substitutivity

Page 17: The Promise and Peril of RDF for Formalizing the Humanities James Silas Creel Sarah Potvin Texas A&M University Libraries April 10, 2015 Arlington, Texas

Some RDF Success Stories

• Geonames – www.geonames.org• Pleiades – www.pleiades.stoa.org• Pelagios - isaw.nyu.edu

/exhibitions/space/pelagios.html• VIVO? - www.vivoweb.org

Page 18: The Promise and Peril of RDF for Formalizing the Humanities James Silas Creel Sarah Potvin Texas A&M University Libraries April 10, 2015 Arlington, Texas

Geonames

• An online gazetteer with a webservice and free data download

• ~ 8 million place names with focused metadata– Latitude and longitude– Feature types– Containing place– Alternate names– Links to Wikipedia articles

• Geonames’ data are available as RDF, and each geoname has a URI.

• This availability has afforded data linking, e.g. with DBPedia

• Under the hood, its data are in MySQL

Page 19: The Promise and Peril of RDF for Formalizing the Humanities James Silas Creel Sarah Potvin Texas A&M University Libraries April 10, 2015 Arlington, Texas

Pleiades

• An online gazetteer of the ancient world• Extensive information exposed as RDF using

a number of schemas– Locations– Relationships to other places– Primary source citations– Time periods

• Under the hood, its data are in a Zope DB.

Page 20: The Promise and Peril of RDF for Formalizing the Humanities James Silas Creel Sarah Potvin Texas A&M University Libraries April 10, 2015 Arlington, Texas

Pelagios

• A collaborative effort among 30 institutions to annotate historic documents with Pleiades-linked data

• Effort has concentrated on tools to assist annotators concentrating on particular collections

Page 21: The Promise and Peril of RDF for Formalizing the Humanities James Silas Creel Sarah Potvin Texas A&M University Libraries April 10, 2015 Arlington, Texas

VIVO

• A Semantic Web tool for describing research, scholarship, people and institutions

• VIVO-ISF (Integrated Semantic Framework) is a separate but related project whose “ontology” underlines the VIVO app

• The development of this ontology has been fraught with controversy, and most adopting institutions utilize a small sampling of the defined properties and classes while being inclined to introduce their own

Page 22: The Promise and Peril of RDF for Formalizing the Humanities James Silas Creel Sarah Potvin Texas A&M University Libraries April 10, 2015 Arlington, Texas

Conclusions

• Governance and collaboration facilitate wider adoption of ontologies– “ontologies” and schemata are meaningful only to adopters

• Domain circumscription facilitates expression– By circumscribing your domain, you can be parsimonious about

the ontologies, classes, and properties you employ.– By being parsimonious with ontological commitments, one

makes expression more efficient.– This efficiency of expression facilitates growth of your

knowledge base (i.e. graph)

• Growth leads to success in linked open data, as big knowledge bases are the big targets for linking