33
HCC class lecture 8 John Canny 2/23/09

HCC class lecture 8 - University of California, Berkeley

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: HCC class lecture 8 - University of California, Berkeley

HCC classlecture 8

John Canny2/23/09

Page 2: HCC class lecture 8 - University of California, Berkeley

Vygotsky’s Genetic Planes

PhylogeneticSocial-historicalOntogeneticMicrogenetic

What did he mean by genetic?

Page 3: HCC class lecture 8 - University of California, Berkeley

Internalization

Social functions

Internal (mental) functions

Social Plane

Internal (mental) Plane

InternalizationScaffoldingShowing, explaining

Listening and reading

Page 4: HCC class lecture 8 - University of California, Berkeley

Externalization

Social/historical artifacts

Internal (mental) functions

Social/historical Plane

Internal (mental) Plane

Externalization Talking, Writing

Page 5: HCC class lecture 8 - University of California, Berkeley

Internalization/Externalization

Page 6: HCC class lecture 8 - University of California, Berkeley

Power Laws

Pick a corpus such as:English (collection of many samples)Works of ShakespeareJames Joyce’s “Ulysses”

and count the occurrences of each word. Sort in decreasing order, let r be the rank in this order. Then

where is the frequency of the word of rank r.

αrcrf ≈)(

)(rf

Page 7: HCC class lecture 8 - University of California, Berkeley

Power Law – alternate form

Instead of frequency vs. rank, we can plot frequency vs. number of sets with that frequency.

The value β in this form is related to α via β=1/α+1.

This was Zipf’s original form, and the one analyzed by Newell.

βicig ')( ≈

Page 8: HCC class lecture 8 - University of California, Berkeley

Examples of Power Laws

Note: size vs frequency of that size – Zipf’s original form

Page 9: HCC class lecture 8 - University of California, Berkeley

Examples of Power Laws

These are in rank-frequency form.

Page 10: HCC class lecture 8 - University of California, Berkeley

Examples of Power Laws

Page 11: HCC class lecture 8 - University of California, Berkeley

Examples of Power Laws

Page 12: HCC class lecture 8 - University of California, Berkeley

Examples of Power Laws

Page 13: HCC class lecture 8 - University of California, Berkeley

Examples of Power Laws

Page 14: HCC class lecture 8 - University of California, Berkeley

Examples of Power Laws

AlsoNumber of users’ Facebook friendsThe popularity of Facebook appsNumber of pages in web sitesNumber of links into a web siteNumber of links out of a web site

Page 15: HCC class lecture 8 - University of California, Berkeley

Preferential Attachment

Page 16: HCC class lecture 8 - University of California, Berkeley

Yule’s law (1925)

Number ofspecies ineach genera

Genera

Pure birth process:Only new species are added

Page 17: HCC class lecture 8 - University of California, Berkeley

Literary Theory: Structuralism

Looks for “structures” in the domain of study, e.g. literature or anthropology, and their relation to otherStructure includes local (sentence) structure as on the next slide.Also includes deeper structures such as role and plot. E.g. “West Side Story” is the same plot structure as “Romeo and Juliet” Structuralists often look for “universal” structures, e.g. Freud’s Oedipal complex

Page 18: HCC class lecture 8 - University of California, Berkeley

Literary Theory: Structuralism

Page 19: HCC class lecture 8 - University of California, Berkeley

Bakhtin: “The Dialogic Imagination”Multiple voices are evident in a text: heteroglossia or

multivocality or polyphony.

Page 20: HCC class lecture 8 - University of California, Berkeley

Kristeva: IntertextualityKristeva elaborated Bakhtin’s ideas into the theory of

intertextuality: Texts borrowed and adapted from other texts.

AllusionCharactersPlotFormScene

Page 21: HCC class lecture 8 - University of California, Berkeley

Barthes: “S/Z”“A text is... a multidimensional space in which a variety of writings,

none of them original, blend and clash. The text is a tissue of quotations... The writer can only imitate a gesture that is always anterior, never original. His only power is to mix writings, to counter the ones with the others, in such a way as never to rest on any one of them”

Lexia

Page 22: HCC class lecture 8 - University of California, Berkeley

Simon’s model of textsText is built by sampling earlier texts:

Association: sampling earlier passages in the same corpus.Imitation: “sampling segments of word sequences from other works he has written, from works of other authors, and, of course, from sequences he has heard.”

Page 23: HCC class lecture 8 - University of California, Berkeley

Simon’s model of textsStatified sampling:Sampling and re-assembly of small segments of text.

The choice of which segments to assemble does not have to be random.

Page 24: HCC class lecture 8 - University of California, Berkeley

Simon’s model of textsSimon’s model explains the familiar Zipf curve.Limitations:

Pure “birth” process*Should work for differentnotions of “strata”

* But birth-death processesin equilibrium also produceZipf curves

Page 25: HCC class lecture 8 - University of California, Berkeley

Genetic LawsWe have given an explanation of Power Law behavior in

texts via internalization/externalization:

Page 26: HCC class lecture 8 - University of California, Berkeley

Genetic LawsOther similar phenomena may be explained in this way:

Sales of books, or many other itemsCitations of scientific articlesNumber of pages in web sitesNumber of links into a web siteNumber of links out of a web siteNumber of users’ Facebook friendsThe popularity of Facebook apps

Page 27: HCC class lecture 8 - University of California, Berkeley

Language as ActionWhat we have seen so far:

Many choice phenomena show the fingerprint of internalization/externalization and genetic origin. This includes language – both collective and individual.

Is there a more general link between language and action, as Vygotsky and others have suggested?

Page 28: HCC class lecture 8 - University of California, Berkeley

Georgia Tech Home26 occupancy sensorsData recorded over several weeks

Page 29: HCC class lecture 8 - University of California, Berkeley

N-gramsN-gram are sequences of n tokens, in this case n sensorsThe following is a 6-gram sequence of locations:3-11-27-12-19-20

Page 30: HCC class lecture 8 - University of California, Berkeley

N-gram statisticsNot only words in English, but n-grams of words in

English follow power laws*. In the smart home data, n-grams are a more reasonable

unit of analysis than individual sensor sites.We might expect to see power law behavior if movement

about the house is governed by “familiar habit” rather than optimal movement or planning.

* For small corpora, the n-gram stats for n>1 are often closer to an exact power law than for 1-grams (words).

Page 31: HCC class lecture 8 - University of California, Berkeley

N-gram statisticsHere is the data from the smart home experiment in Zipf’s

original form. All plots show a β close to 2, which corresponds to α close to 1.

Slope β increases slightly as n increases (so α decreasing)

Page 32: HCC class lecture 8 - University of California, Berkeley

ConclusionsThere appears to be a genetic mechanism at play, even in

simple physical movement about the house.

At least from one perspective (n-gram analysis), language and one type of action are remarkably similar.

Many other human phenomena show power law behavior, either through internalization/externalization or purely internal mechanisms.

Page 33: HCC class lecture 8 - University of California, Berkeley

Discussion questions1. Suggest another measure of human behavior that

might show genetic dynamics, and research whether it shows power law behavior (do a web search). Be prepared to explain the genetic mechanism.

2. Discuss the freedom of the author given the statistical similarities of new texts to old ones.