46

ACLS Commission on Cyberinfrastructure for the Humanities and Social Sciences Chartered in 2004 Public Information-gathering sessions in 2004 Draft report

Embed Size (px)

Citation preview

ACLS Commission on Cyberinfrastructure for the Humanities

and Social Sciences

Chartered in 2004

Public Information-gathering sessions in 2004

Draft report for comment in March 2005

Final report June 2005 (?)

http://www.acls.org/cyberinfrastructure/cyber.htm

Commission MembersPaul CourantProvost, EconomicsUniversity of Michigan

Sarah FraserArt HistoryNorthwestern University

Mike GoodchildGeographyUCSG

Margaret HedstromSchool of InformationUniversity of Michigan

Charles HenryVP & CIORice University

Peter B. KaufmanVP, Innodata-IsogenPresident, Intelligent Television

Jerome McGannEnglishUniversity of Virginia

Roy RosenzweigHistoryGeorge Mason University

John Unsworth (Chair)Library and Information ScienceUniversity of Illinois, Urbana-Champaign

Bruce ZuckermanReligionUniversity of Southern California

Goals

Propose coordination of funding, policy, and innovation to maximize benefits of, and for, humanities and social sciences

Open doors to a larger audience for the social sciences and the humanities

Identify areas in which H&SS have unique CI requirements or responsibilities

Disclaimers I begin from the premises of the Atkins report on

Cyberinfrastructure

Part of what follows is drawn from public information-gathering sessions of the ACLS Commission

The rest represents my own view of the situation in humanities scholarship, digital libraries, and scholarly publishing

Overview The nature of Cyberinfrastructures (CI) The purpose of CI To what extent can CI be shared? Libraries as traditional/cyber infrastructure in the humanities Digitization and libraries The value of the original The value of digitization in libraries Constituency for CI in humanities and social sciences Digitization and humanities scholarship The value of digitization for humanities scholarship Professional development/training/support Policy issues (copyright) Pragmatic issues (Interdependence and reusability)

Three Layers of Cyberinfrastructure

As specified in the “Atkins Report” (Revolutionizing Science and Engineering Through Cyberinfrastructure:Report of the National Science FoundationBlue-Ribbon Advisory Panel on Cyberinfrastructure January 2003)

“The base technologies underlying

cyberinfrastructure are the integrated electro-

optical components of computation, storage,

and communication that continue to advance

in raw capacity at exponential rates.”

Below Cyberinfrastructure:

Above Cyberinfrastructure:

“Above the cyberinfrastructure layer are

software programs, services, instruments,

data, information, knowledge, and social

practices applicable to specific projects,

disciplines, and communities of practice.”

Cyberinfrastructure Itself

“Between these two layers is the cyberinfrastructure layer of enabling

hardware, algorithms, software, communications, institutions, and personnel.”

The Purpose of Cyberinfrastructure

“This layer should provide an effective and efficient platform for the empowerment of specific communities of researchers to innovate and eventually revolutionize what they do, how they do it, and who participates.”[p. 5]

Cyberinfrastructure is Shared…

In the Atkins report, at least, “software

programs, services, instruments, data,

information, knowledge, and social practices

applicable to specific projects, disciplines,

and communities of practice” are specifically

excluded from that middle layer that is called

Cyberinfrastructure, though built upon it.

…But Cyberinfrastructure is not Digital Plumbing

Also excluded from the middle layer

are “integrated electro-optical

components of computation, storage,

and communication”--even though

Cyberinfrastructure depends on these

things.

Shared CI for the Sciences

The Atkins report speaks of “thousands of

overlapping field- and project-specific

collaboratories or grid communities,

customized at the application layer but

extensively sharing common

cyberinfrastructure.”

Shared CI for the Sciences grids of computational centers comprehensive libraries of digital objects well-curated federated collections of scientific data online instruments and vast sensor arrays convenient software toolkits the ability to collaborate with physically distributed teams of people using all of these capabilities.

CI that H&SS Could Share with the Sciences & Engineering

The humanities and the social sciences may be able to share with the sciences some aspects of:

Technologies for distance collaboration Software toolkits [“for resource discovery, modeling, and interactive visualization”]

Shared CI For Humanities and Social Sciences?

Training and professional development Standards and communities of (best) practices Well-curated federated collections of digital objects Digital preservation

Shared CI for H&SS (cont.)

Data-repositories/depositories for thesocial sciences

Analytical tools appropriate to H&SSgoals and materials

Tools, legislation, institutions that address critical policy issues

Libraries as traditional/cyberinfrastructure in the humanities

The library is where research across the disciplines of the humanities has traditionally been conducted, using books, journals, maps, and other print resources (emphasis on collections).

The library is still the best laboratory for humanities research, across the disciplines, using digital resources (emphasis on collaboration and expertise).

The Value of the Original

"I would like to encourage the Commission to consider the value of the original and authentic sources--ink on vellum or pixels on a screen--as essential to the humanities and social sciences infrastructure.”

Max EvansExecutive DirectorNational Historical Publications and Records CommissionDC Meeting, April 27, 2004

Digitization and Libraries

When can a digital surrogate stand in for its source?

When can a digital surrogate replace its source?

When might a digital surrogate be superior to its source?

What is the cost of producing and maintaining digital surrogates?

What risks do digital surrogates pose?

Preservation"The Preservation Purposes of the Digital Product [include efforts to]. . . . Protect Originals. . . . Represent Originals. . . . [and] Transcend Originals. . . . In a very small but increasing number of applications, digital imaging holds the promise of generating a product that can be used for purposes that are impossible to achieve with the original sources. This category includes imaging that uses special lighting to draw out details obscured by age, use, and environmental damage; imaging that makes use of specialized photographic intermediates; or imaging of such high resolution that the study of artifactual characteristics is possible."

(Paul Conway, in The Handbook for Digital Projects: A Management Tool for Preservation and Access)

Preservation

"The loss of evidential value and permanent accessibility inherent in digital forms and textual conversion [by OCR] exclude them as a preservation medium. They can only be employed in addition to preservation on film in order to increase the ease of use."

”[D]igital imaging is not suitable for permanent storage."

(Angelika Menne-Haritz and Nils Brübach in "The Intrinsic Value of Archive and Library Material")

What is the value of digitization?

There is not a single answer to the question "What is the value of a digital surrogate?”

The answer depends on the nature of the original and the conditions of its use.

Original materials may be rare or not rare, frequently used or infrequently used.

Four cases

1. Materials that are not rare and that are frequently used

2. Materials that are not rare and that are infrequently used

3. Materials that are rare and are frequently used

4. Materials that are rare and are infrequently used

Not rare, frequently used

Digital surrogates for such an object might be worth producing

To reduce the cost associated with reshelving the object

To make the object simultaneously available to multiple users (for example, through an electronic reserve desk)

To replace the object, thereby doing away with the cost of housing it.

Not rare, infrequently used

Digital surrogates for such an object might be worth producing to help users to determine whether recalling an object from long-term storage was worth the wait—and worth the library's effort

to increase frequency of use (by providing searchable metadata, for example)

to reduce costs by replacing the object with a digital surrogate.

Rare, frequently used

In this case, the principal benefits of digital surrogates are:

•Preservation: by standing in for some uses, the digital surrogate reduces wear and tear on the original object.

•Access: by providing access that doesn't impose wear and tear on the original, the digital surrogate makes rare objects more accessible.

Rare, infrequently used

The least likely category to be represented with digital surrogates, because digitizing is expensive, but, depending on the cost of housing the object,

• digitizing and deaccessioning may be a reasonable choice

…though ibraries would need to be aware of the actual or potential rarity of materials used infrequently today: in the future, these may be valuable to users, or for uses, that one could not predict today.

What is the value of digitization?When can a digital surrogate stand in for its source?

When it answers to the needs of users.

When can a digital surrogate replace its source?If the source is not rare.

When might a digital surrogate be superior to its source?•In cases where remote or simultaneous access to the object is required, or when software provides tools that allow something more or different than physical examination. •When the record of the digital surrogate finds its way into indexes and search engines that would never find the physical original.

What is the cost of digitization?

What is the cost of producing and maintaining digital surrogates?

The cost of producing digital surrogates depends, among other things, on the uniformity, disposability, and legibility of the original. The cost of maintenance depends on frequency of use and the idiosyncracy of format, but beyond that it depends on technological, social, and institutional factors that are difficult or impossible to predict—which is an important reason for being cautious when one chooses to replace a physical object (the maintenance costs for which are known) with a digital surrogate (the maintenance costs for which are, to some extent, unknown).

What are the risks of digitization?

What risks do digital surrogates pose?

The principal risk posed by digital surrogates is the risk of disposing of an imperfectly represented original because one believes the digital surrogate to be a perfect substitute for it. Digital surrogates also pose the risk of providing a partial view (of an object) that seems to be complete, and the risk of de-contextualization—the possibility that the digital surrogate will become detached from some context that is important to understanding what it is, and will be received and understood in the absence of that context.

Constituency for H&SS CI1) Is the proposed cyberinfrastructure for the humanities

and social sciences intended to serve only scholars, or will it serve a larger educational purpose?

2) Is it anticipated that there will be a core content, around which tools and services can be developed? and,

3) If so, who will maintain this content? [and who will create it?]

Joyce RayAssociate Deputy Director for Library ServicesInstitute of Museum and Library ServicesDC Meeting, April 27, 2004

Benefits of Digitization in Humanities

The most obvious benefit of digitization, for the humanities, is access to primary source materials. The aggregation of these resources, in digital form, is bound to provide new sources for humanities scholarship.

Less obvious, and further out, we can expect to see new computational methods and new tools for humanities scholarship—new tools for discovery and analysis, for finding and exploring patterns.

Representation is Interpretation

In the humanities, objects of study can be images, texts, sounds, maps, performances, concepts, three-dimensional objects.

When we make a digital surrogate for any one of these, we always believe that our aim is to represent it as accurately, as faithfully as possible, with the least possible interference, or noise, in the process

…but when, as scholars, we deal with these digital surrogates, or produce our own, we learn that there's no such thing as an innocent act of representation: every representation is an interpretation.

Digitization and Scholarly Self-understanding

The value of digitization for humanities scholarship is that it externalizes interpretation, re-presents it to us in the form of the surrogate, and forces us, as humanities scholars, to confront and evaluate our beliefs and understandings, concerning the object of digitization, as well as our perspectives and purposes with respect to it.

Collaboration in the Digital LibraryScholars can learn a great deal from the expertise of librarians in cataloging and classification, in information organization, in preservation and access.

Librarians can learn a great deal about the peculiar and idiosyncratic characteristics of individual works, or authors, or movements, or literatures, by working with specialists who know the features and fine points of that material.

Working together is perhaps the best way to find the proper balance point, in a project involving digital representation, between the abstract and the particular, between the collection and the item, between the librarian and the scholar.

Digitization and the Audience for the Humanities

“Is the proposed cyberinfrastructure for the humanities and social sciences intended to serve only scholars, or will it serve a larger educational purpose?” (Joyce Ray, IMLS)

The “crisis in scholarly publishing” is a crisis of audience: nobody’s reading these books.

We could enlarge the audience for humanities scholarship, not by dumbing it down, but by making it more readily available.

http://www3.isrl.uiuc.edu/~unsworth/sparc.2004.html

Professional Development/Training"far too much emphasis nationally and within my own institution on purchasing hardware and software and putting high capacity wires in the walls (and, now, on eliminating those wires). Far too little attention has been paid by universities (or funds made available by them) for development of rich academic content and relevant professional development opportunities for faculty.”

Stephen Brier, Associate Provost for Instructional TechnologyDean for Interdisciplinary Studies, Co-Director, New Media LabCUNY Graduate CenterNY Meeting, June 19, 2004

Tools for Interpretation"human interpretation is the heart of the humanities. . . . devising computer-assisted ways for humans to interpret more effectively vast arrays of the human enterprise is the major challenge. Contextual issues are part of that: time/age/period, theoretical model(s), topics, themes, preconditions for comprehension, helpers for comprehension, applications which use them, datasets associated with them, and so forth.”

Michael Jensen, Director of Publishing TechnologiesNational Academies PressDC Meeting, April 27, 2004

How H&SS Differ From Science

"The social sciences and humanities are different from the physical and biological sciences in the variety, complexity, incomprehensibility, and intractability of the entities that are studied. Consequently, the physical and biological science models in the National Science Foundation’s report Revolutionizing Science and Engineering through Cyberinfrastructure do not directly apply to the social sciences, which have different kinds of problems. . . .”

How H&SS Differ From Science“. . . . These problems make it difficult to understand social reality in the first instance, and they also pose special problems for creating cyberinfrastructure for the social sciences. But they also provide interesting challenges for computer scientists, digital librarians, and social scientists themselves. Perhaps most importantly, overcoming these problems provides the opportunity to revolutionize the social sciences.”

Henry Brady, Professor of Political Science and Public PolicyDirector, Survey Research Center and UC Data, UC BerkeleyBerkeley Meeting, August 21, 2004

Policy Issues: Copyright More than 300TB of print material each year (books,

journals, newspapers, magazines) More than 25TB of Movies each year About 375,000TB of Photography each year About 987TB of Radio each year About 8,000TB of TV each year More than 58TB of Audio CDs each year …and that doesn’t include software (say, video games)Or materials born-digital on the Web, etc…

(from “How Much Information? 2003,” Peter Lyman & Hal Varian)

CopyrightEach of these books, journals, movies, TV shows, etc. will be copyrighted for the life of the author plus 70 years. In many case, the copyright will extend effectively forever, because of recent changes in copyright law.

How will the record of the digital present be copied, even if only to be preserved? How will the history of the future be written? How will the culture of the future sample the culture of the past?

Pragmatic Issues:Interdependence and Reusability

"A fundamental lesson of digital libraries research is that advanced research and a rich information infrastructure are both mutually supportive and mutually dependent . . . [and] content must be usable and readily re-usable by multiple audiences."

(Cited by Joyce Ray, from "Knowledge Lost in Information: Report of the NSF Workshop on Research Directions for Digital Libraries”

(2003)).

But what about “representation is interpretation”?

ReferencesACLS Commission: http://www.acls.org/cyberinfrastructure/cyber.htm

Some of the foregoing was originally drafted by the author for The Evidence in Hand: Report of the Task Force on the Artifact in Library Collections, published in November, 2001, by the Council on Library and Information Resources: http://www.clir.org/pubs/reports/pub103/pub103.pdf</i></p>

Menne-Haritz, Angelika and Nils Brbach. "The Intrinsic Value of Archive and Library Material." Digitale Texte der Archivschule Marburg Nr. 5: http://www.uni-marburg.de/archivschule/intrinsengl.html

Sitts, Maxine K. Ed.. Handbook for Digital Projects: A Management Tool for Preservation and Access. First Edition. Northeast Document Conservation Center, Andover, Massachusetts, 2000: http://www.nedcc.org/digital/dighome.htm