52
The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001 What Humanists Need to Know About Computing (and Computer Science) Nancy Ide Department of Computer Science Vassar College

What Humanists Need to Know About Computing (and Computer Science)

  • Upload
    inari

  • View
    25

  • Download
    1

Embed Size (px)

DESCRIPTION

What Humanists Need to Know About Computing (and Computer Science). Nancy Ide Department of Computer Science Vassar College. The Big Question. What is Humanities Computing? Any Humanist using a computer? Any Humanist using data relevant to his or her field that is stored on a computer? - PowerPoint PPT Presentation

Citation preview

Page 1: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

What Humanists Need to Know About Computing(and Computer Science)

Nancy IdeDepartment of Computer Science

Vassar College

Page 2: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

The Big Question• What is Humanities Computing?

– Any Humanist using a computer?– Any Humanist using data relevant to his or

her field that is stored on a computer?– Any Humanist creating data relevant to his or

her field to be stored on a computer?– Any Humanist using an algorithmic process to

analyze data stored on a computer?– Any Humanist creating an algorithmic process

to analyze data stored on a computer?

Page 3: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

Any Humanist using a computer?

• This is certainly too broad to serve as a definition (these days)

• Would include– Word processing– Web access– Email

Page 4: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

Any Humanist using data stored on a computer?

• This is better, but maybe still too imprecise

• Search/retrieval from text, images, etc.– Searching a corpus for occurrences of a

word, syntactic pattern, etc.– Searching a digitized image for patterns

• Web access?

Page 5: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

Any Humanist creating data to be stored on a

computer?• Getting better…

– Text encoding– Creation of corpora, lexicons, concordances, etc.– Digitized images– Databases– Hypertext/hypermedia

• But do we include: – electronic publishing– creation of web pages, on-line course materials, etc.?

Page 6: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

Any Humanist using an algorithmic process for

analysis?• "Algorithmic process" = computer

program (beyond search/access)• Includes use of:

– Statistical routines– Named entity recognizers, part of speech

taggers, syntactic analyzers, etc.– GIS, spatial modeling routines, etc.

• We seem to be on more solid ground here…

Page 7: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

Any Humanist creating an algorithmic process?

• E.g., writers of text analysis software

• Probably others, but not many come to mind…– This would seem to be a relatively

small segment of the Humanities Computing community

Page 8: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

What Does That Leave Us With?

Computeruse

Data use

Algorithmcreation

Datacreation

Algorithmuse

HUMANITIES COMPUTING

Page 9: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

In Terms of Percentages…?

Data use

Algorithmcreation

Datacreation

Algorithmuse

Page 10: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

A Brief Historical Aside

• Humanities Computing in the 1960's– Indistinguishable from "computational

linguistics"– Use of statistics to analyze language– Concordance creation, dictionary

creation, corpus creation

Page 11: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

• Humanities Computing in the 1970's– Computational linguistics embraced

the symbolic approach and abandoned (even scorned) statistical analysis, now the province of HC• Stylistic analysis, authorship studies,

literary analysis – Creation of resources (corpora,

lexicons, concordances) continues– First development of software for text

analysis

Page 12: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

Humanities Computing in the 1980's

– More of the same– Electronic scholarly editing becomes big– But the PC introduced a new contingent:

• Word processing• Computer-assisted learning

– Late '80's : Two major events• TEXT ENCODING INITIATIVE• Computational linguistics re-discovers statistics

and language resources

Page 13: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

• The TEI establishes text encoding as a core activity of HC

• CL's embrace of statistics and resource building blurs the distinction between HC and CL

• Stronger computational skills in the CL community enable them to "steal" much previous HC work and take it farther

Page 14: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

Humanities Computing in the 1990's

• Text encoding still major focus• Word processing, computer-assisted

learning drop out• Addition of several others due to increased

computational power:– Digital images– Hypertext/hypermedia– Digital libraries– Advanced modeling tools– Web-based work– Electronic publishing

Page 15: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

Now• CL has largely taken over development of

statistical methods for language analysis, including HC staples such as authorship, stylistics

• Also taking over some major kinds of resource creation (corpora, lexicons, etc.)

• CL working on text encoding as well, esp. in context of W3 developments (XML, RDF, Semantic Web stuff)

Are these things still a part of Humanities Computing?

Page 16: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

[end of aside]

Page 17: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

So, What Do Humanists Need to Know About

Computing?My Previous Argument

– Back in the mid-1980's, I argued that Humanists needed to know how to write computer programs

– The chart on the earlier slide suggests this is probably not the case anymore

My Current Argument– The fundamental intellectual skills I was concerned about are still what is needed

Page 18: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

Data Use• A lot of this is search/retrieval• What does this require?

– Know what you are looking for and how it is instantiated in the data

• Example: looking for certain imagery in a text– First have to define “imagery” -- think in terms of

character patterns– Does data include lemmas?

• Example: Searching a database– What is in the DB and how is it structured?

Page 19: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

• Know how to formulate your query in precise terms– Rudimentary knowledge of boolean logic– Sometimes, knowledge of query language

• E.g., SQL for databases• More generally, knowledge of tools

for access and retrieval, and what they can and cannot do

Page 20: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

Data Creation• Fundamentally, a data modeling

problem– Identification of the objects in the data

and their properties • Decomposition into sub-components

– Identification of the relations among the objects• Structural relations: Inclusion? Super-set?

Overlap? Parallel? • Logical relations :e.g., "author-of" may be a

relation between a "title" object and a "name" object

Page 21: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

What Does One Need to Know?

• The obvious:– XML, XSL/XSLT, XML schemas (+ tools)– RDF, RDF schemas– Familiarity with Semantic Web work (ontologies)– TEI, EAGLES/ISLE XML Corpus Encoding

Standard• The not-so-obvious:

– Data modeling principles• Component analysis• Identification of components vs. relations vs. properties

Page 22: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

• Encoding options– E.g., nested tags (implicit relation) vs. link

(explicit relation)– Many documents vs. one

• Why it may or may not matter given XSLT and RDF

BOOK PARTS: FRONTMATTER CHAPTERS CHAPTER PARAGRAPHS PARAGRAPH BACKMATTER RELATIONS: TITLE AUTHOR PUBLISHER PROPERTIES: PUBLISHED/UNPUBLISHED MONOGRAPH/EDITED VOLUME

Page 23: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

• The TEI Guidelines represent one of the most extensive data modeling efforts ever

• World Wide Web Consortium developments like RDF schemas and work on the Semantic Web take us up another level– Powerful mechanisms for specifying relations

and properties• “object-oriented” model of class membership,

inheritance of properties, named relations, etc.• Instantiated objects can be element in a document,

whole document or collection, etc.– Semantic Web work is defining ontologies for

web data, will enable inferencing etc. over objects and their relations

• Simple example: if a person is the author of a government document, we can deduce that he/she is a government official

Page 24: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

Algorithm Use• First, need a good survey of what

is out there and how to get it• Need sound idea of what the

algorithm does • Need a very good idea of what the

input and output mean in terms of the algorithm– “garbage in, garbage out”–“use the appropriate model”

Page 25: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

Example: Statistics• On the one hand:

– Minimally, need to know what things like principal components analysis, Pearson correlation, etc. are intended to tell you

– Have to have some knowledge of randomness/chance vs. reliable confidence levels, etc.

• On the other:– Have to understand what your input is/needs to be

(formal representation)– Have to be able to interpret output– Have to know when it does and does not make sense

to use statistical methods, in terms of humanities goals

Page 26: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

• Need to know what tools exist, how to get them

• Need to learn how to use multiple tools to accomplish a task– E.g., many programs for automatic part

of speech tagging, shallow syntactic bracketing, etc., are available for free; WordNet is a free resource containing information about word relations (synonomy, hyperonymy)

– Could use a sequence of such programs to start with a “raw” text, perform various kinds of semantic analysis

Page 27: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

Algorithm Creation• Need to know how to program in

some useful language• Need to be very familiar with at

least one operating system (preferably UNIX/LINUX)

• BUT THIS IS ONLY A START…

Page 28: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

• Most important: Master principles of program/software

design

• More generally, this is a way of thinking about problems and their solutions– Abstraction over concepts– Modularity, generalization– Concepts like recursion– Sound data structuring practices

• Good languages to develop this:– LISP/Scheme– Perl– Java (maybe C++ if one is disciplined…)

Page 29: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

Bottom Line• Humanists are not (necessarily)

trained to think formally about problems and their solutions

• For activities we consider to be “humanities computing” at any level, it is necessary to formalize concepts that we may not be used to formalizing

Page 30: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

• Some things are easier to formalize, even when haven’t done so before– E.g., basic data models for document

types• Other things are harder:

– E.g., a formal specification of “imagery” that the computer can find in a text

Page 31: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

My Current Argument• Humanities Computing Curricula

should have as the ultimate goal the development of intellectual skills as well as computational skills– Specifically, formalization of “data”,

“problem”, and “solution”– This may be a new way of thinking for

many, or a kind of thinking that needs to be more fully developed

Page 32: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

How do we develop these skills?

• By doing, all the way up the line from using a computer for basic tasks through data creation through programming

• But we already do that…

Page 33: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

So what’s new?• Typically these intellectual skills are

expected to develop “bottom-up”– Most humanists do not go far enough to reach

the “top” on their own, so never see the “big picture”

– Their computer science skills--in terms of principled problem statement, abstraction, etc.--do not develop adequately

– You can end up with messy (i.e., unreusable, non-extensible) code, badly formulated problems and badly applied algorithms (yielding unreliable results), and data that is much harder to use for tasks other than that for which it was designed than it should be

Page 34: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

HC Curricula Need to Develop an Approach that is at once

“top-down” (developing intellectual skills) and

“bottom-up” (developing practical skills and knowledge)

The driving force of the curriculum should be exercise in formalization

Page 35: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

A Few Suggestions for HC Curricula

• Don’t hesitate to have students put pen to paper before getting in front of the computer– Example: Students take a problem in

their own discipline and “translate” it into formal terms, on paper

– Then implement

Page 36: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

• “I’m interested in Blake’s imagery”– What is an “image”?– Are there text patterns that realize

images according to your definition?– Are there patterns of, for example,

their distribution across the text that can tell you anything about it?

– Do you need to have your text lemmatized? Tagged for part-of-speech? Would knowing synonyms, hypernyms, etc. help in any way?

Page 37: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

• “I want to organize all my information about [some historical figure] so I can get to specific pieces of information directly, explore connections, etc.”– Give an overview of the relational DB model– Students design relational DB for their data

on paper– Enter into a standard DB application– See where problems lie– Can you do the queries you want?

Page 38: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

• “I want to create a corpus of [some literary figure’s] poetry– Develop a data model on paper– Think about different “views” of the data

and ramifications of encoding choices• E.g., I see names as representing some person

associated with other features, properties• Linguist sees it as a proper noun Can I encode this so as to make it easier to see

both views?– Instantiate model as an XML schema, test

on a small sample (parse)

Page 39: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

Give Students a Peek Under the Hood

• Provide information on how computers work internally, how data is structured, accessed, etc.– Show them some real code (e.g. Java, Perl,

LISP…), have them “read” and follow it• Make sure the examples embody sound

programming principles, modularity, good data structures, etc.--and point this out!

– Show them how digital images are stored, accessed at the gut level of the machine

– Etc. -- the more the better

Page 40: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

Teach good programming strategies even if students

will never program

• Top-down, modular design, abstraction, generalization are mental disciplines that can/should be applied everywhere

• This is the art of computer science!• Teach these things as a methodology

– Start small, test, add…etc.– E.g. data model for corpus--build XML schema in

stages (encode, test) based on model

Page 41: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

Don’t hesitate to have students perform exercises

that are not directly relevant to what they want to do, if it

will increase their facility with problem formulation,

data organization, etc.

Intellectual skills develop with practice

Page 42: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

Show Students How to Put Pieces Together

• Play around with UNIX and UNIX tools like grep, cut, sort, uniq, wc, and awk etc. and see how much they can accomplish

• E.g., a frequency dictionary for a text:Use awk to isolate each word on a lineSort | uniq -c

– They’ll see the magic as well as the “bugs”--the things that are treated as a “word”

Page 43: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

Show Students How to Get the Pieces

• Our most valuable resource is the WWW: we can find all sorts of freely distributed tools and resources

• Students should do this as second nature

MY OPINIONThe most pervasive problem in HC is a lack

of awareness/exploitation of tools and resources, and of work done by others

considered to be out of the field

Page 44: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

Another Exercise:Ontology Building

• One of the most important activities to which Humanists are well-placed to contribute is the development of domain-specific ontologies– This includes “meta-data”– Will be used to build the Semantic

Web

Page 45: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

Exercise• Students build an ontology for some

domain or sub-domain

kind-of

part-of

NAME

PERSON-NAME PLACE-NAME ORGANIZATION-NAMEEVENT-NAME

FIRST-NAME MIDDLE-NAME LAST-NAMETITLE

?

PROPER NOUN

NOUN

Page 46: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

RDF Schema• Instantiate as an RDF Schema,

using freely available tool– Don’t need to know RDF syntax– Focus on the objects and relations

being described

Page 47: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

Beyond Ontologies• Inferencing over ontologies can

enable discovery of implicit relations, show inconsistencies

• Exercise: Have students represent their ontologies in a standard (free) logic system (e.g., CLASSIC), query

Page 48: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

• E.g., query a set of historical documents: “give me all the government officials between 1860-1865”– System does not have the information

directly, can deduce that a government official is an author of a government document, pick all authors of gov’t docs between 1860-65

Welty & Ide, (1999). Using the right tools: Enhancing retrieval from marked-up documents. Computers and the Humanities 33:1-2, Special

Issue on the Tenth Anniversary of the Text Encoding Initiative

Page 49: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

And Not Least Important…

• Students need to recognize the limitations of formalization– This should always be at the back of

the instructor’s agenda• Students need to explore expanding

the limitations of formalization– Think hard about new ways to formally

represent or analyze sometimes very “non-formal” things

Page 50: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

Final Words

TEACH STUDENTS HOW TO LEARNProvide them with intellectual skills

Humanists need to learn computing skills based on

Computer Science

Page 51: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

[NP [ ADJ More Final] [HEAD Words]](from others as well as myself)

“The computer is a modeling machine”

“Need to understand systems and models in the abstract in order to understand them; need abstract thinking skills”

“Learning fundamentals of mathematical sub-fields is all we need”

I agree

I agree

I disagree

“There is a big divide among humanists and computer scientists entering courses”

I agree

Page 52: What Humanists Need to Know About Computing (and Computer Science)

The Humanities Computing Curriculum • Nanaimo, British Columbia, Canada • November 9-10, 2001

Thank You!Special thanks to Ray Siemens

and his fantastic staff for a great conference!