38
2002.09.03 - SLIDE 1 IS 202 - Fall 2002 Lecture 03: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 am Fall 2002 SIMS 202: Information Organization and Retrieval Credits to Marti Hearst and Warren Sack for some of the slides in this lecture

2002.09.03 - SLIDE 1IS 202 - Fall 2002 Lecture 03: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

  • View
    214

  • Download
    1

Embed Size (px)

Citation preview

Page 1: 2002.09.03 - SLIDE 1IS 202 - Fall 2002 Lecture 03: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

2002.09.03 - SLIDE 1IS 202 - Fall 2002

Lecture 03: Categorization

Prof. Ray Larson & Prof. Marc Davis

UC Berkeley SIMS

Tuesday and Thursday 10:30 am - 12:00 am

Fall 2002

SIMS 202:

Information Organization

and Retrieval

Credits to Marti Hearst and Warren Sack for some of the slides in this lecture

Page 2: 2002.09.03 - SLIDE 1IS 202 - Fall 2002 Lecture 03: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

2002.09.03 - SLIDE 2IS 202 - Fall 2002

Today

• What Is Information?

• Cognition, Culture, and Categories

• Photo Project Assignment #2

– Photo Use Scenario

Page 3: 2002.09.03 - SLIDE 1IS 202 - Fall 2002 Lecture 03: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

2002.09.03 - SLIDE 3IS 202 - Fall 2002

Assignment 1 Discussion

• Sensory– “Information for me is anything I can take in and process through

any of my senses that somehow influences me in some conscious or unconscious manner”

• Context-dependent– “Information is contextual, multi-faceted, and prone to

interpretation […] the same information might mean completely different things to different people”

• Actionable– Initiates, responds to, and guides action

• Process– Not reducible to a set of objects

• Powerful and ubiquitous

Page 4: 2002.09.03 - SLIDE 1IS 202 - Fall 2002 Lecture 03: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

2002.09.03 - SLIDE 4IS 202 - Fall 2002

Human Communication Theory?

Destination

Noise

Source DecodingEncoding

Message Message

Channel

Page 5: 2002.09.03 - SLIDE 1IS 202 - Fall 2002 Lecture 03: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

2002.09.03 - SLIDE 5IS 202 - Fall 2002

The Conduit Metaphor

• Language functions like a conduit, transferring thoughts bodily from one person to another

• In writing and speaking, people insert their thoughts or feelings in the words

• Words accomplish the transfer by containing the thoughts or feelings and conveying them to others

• In listening or reading, people extract the thoughts and feelings once again from the words

Page 6: 2002.09.03 - SLIDE 1IS 202 - Fall 2002 Lecture 03: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

2002.09.03 - SLIDE 6IS 202 - Fall 2002

Toolmakers’ Paradigm

Page 7: 2002.09.03 - SLIDE 1IS 202 - Fall 2002 Lecture 03: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

2002.09.03 - SLIDE 7IS 202 - Fall 2002

Categorization

09/03/02 Cognition, Culture, and Categories

09/05/02 Artificial Intelligence, Ontologies, and Common Sense

09/10/02 Metadata Introduction

09/12/02 Controlled Vocabularies Introduction

Page 8: 2002.09.03 - SLIDE 1IS 202 - Fall 2002 Lecture 03: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

2002.09.03 - SLIDE 8IS 202 - Fall 2002

Foucault on Borges

• This passage quotes “a certain Chinese encyclopedia” in which it is written that ‘animals are divided into: (a) belonging to the Emperor, (b) embalmed, (c) tame, (d) suckling pigs, (e) sirens, (f) fabulous, (g) stray dogs, (h) included in the present classification, (i) frenzied, (j) innumerable, (k) drawn with a very fine camelhair brush, (l) et cetera, (m) having just broken the water pitcher, (n) that from a long way off look like flies.’– Michel Foucault, The Order of Things, 1970

Page 9: 2002.09.03 - SLIDE 1IS 202 - Fall 2002 Lecture 03: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

2002.09.03 - SLIDE 9IS 202 - Fall 2002

Yahoo! Categorization

Page 10: 2002.09.03 - SLIDE 1IS 202 - Fall 2002 Lecture 03: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

2002.09.03 - SLIDE 10IS 202 - Fall 2002

Yahoo! Categorization Detail

Page 11: 2002.09.03 - SLIDE 1IS 202 - Fall 2002 Lecture 03: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

2002.09.03 - SLIDE 11IS 202 - Fall 2002

Why Study Categorization?

• Categorization is central to how we organize information and the world

• Categorization is a core cognitive process

• In recent years, centuries-old views of categorization have been revised

• Understanding how people categorize can help us design information systems that do a better job at organization and retrieval

Page 12: 2002.09.03 - SLIDE 1IS 202 - Fall 2002 Lecture 03: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

2002.09.03 - SLIDE 12IS 202 - Fall 2002

Why Read Lakoff?

• Very influential figure in recent thinking about human categorization, metaphor, and cognition

• Provides summary of historical work and develops syncretic model of cognition and categorization

• Clear explanations using examples

• Professor at UC Berkeley (Department of Linguistics)

Page 13: 2002.09.03 - SLIDE 1IS 202 - Fall 2002 Lecture 03: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

2002.09.03 - SLIDE 13IS 202 - Fall 2002

George Lakoff

• Lakoff’s research covers many areas of Conceptual Analysis within Cognitive Linguistics– The nature of human conceptual systems, especially metaphor

systems for concepts such as time, events, causation, emotions, morality, the self, politics, etc.

– The development of Cognitive Social Science, which applies ideas of Cognitive Semantics to the Social Sciences

– The implications of Cognitive Science for Philosophy, in collaboration with Mark Johnson, Chair of Philosophy at the University of Oregon

– Neural foundations of conceptual systems and language, in collaboration with Jerome Feldman, of the International Computer Science Institute, seeking to develop biologically-motivated structured connectionist systems to model both the learning of conceptual systems and their neural representations

– The cognitive structure, especially the metaphorical structure, of mathematics, in collaboration with Rafael Núñez

Page 14: 2002.09.03 - SLIDE 1IS 202 - Fall 2002 Lecture 03: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

2002.09.03 - SLIDE 14IS 202 - Fall 2002

George Lakoff

• Selected publications– Metaphors We Live By (with Mark Johnson) Univ. of

Chicago Press. 1980.– Women, Fire, and Dangerous Things. University of

Chicago Press. 1987.– More Than Cool Reason. (with Mark Turner) Univ. of

Chicago Press. 1989.– Moral Politics. University of Chicago Press. 1996.– Philosophy in The Flesh. Basic Books, 1999.– Where Mathematics Comes From: How the Embodied

Mind Brings Mathematics into Being. (with Rafael Núñez). Basic Books. 2000.

Page 15: 2002.09.03 - SLIDE 1IS 202 - Fall 2002 Lecture 03: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

2002.09.03 - SLIDE 15IS 202 - Fall 2002

Objectivist Views

• Thought is mechanical manipulation of symbols• The mind is an abstract machine• Symbols get their meaning from correspondences to the external

world• Symbols are internal representations• Abstract symbols stand in correspondence with the external world

independent of the interpreting organism• The human mind is a mirror of nature• Human bodies play no role in characterizing concepts• Thought is abstract and disembodied• Exclusively symbolic machines are capable of thought• Thought can be broken down into simply “building blocks”• Thought is defined by mathematical logic

Page 16: 2002.09.03 - SLIDE 1IS 202 - Fall 2002 Lecture 03: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

2002.09.03 - SLIDE 16IS 202 - Fall 2002

Lakoff’s Views

• Thought is embodied• Thought is imaginative• Thought has gestalt properties• Thought utilizes basic-level categorization and basic-

level primacy• Thought uses prototypes and family resemblances as

organizing structures• Conceptual structure can be described using cognitive

models that have the above properties• The theory of cognitive models incorporates what was

right about the traditional view of categorization, meaning, and reason, while accounting for the empirical data on categorization and fitting the new view overall

Page 17: 2002.09.03 - SLIDE 1IS 202 - Fall 2002 Lecture 03: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

2002.09.03 - SLIDE 17IS 202 - Fall 2002

Categorization

• Classical categorization– Necessary and sufficient conditions for

membership– Generic-to-specific monohierarchical structure

• Modern categorization– Characteristic features (family resemblances)– Centrality/typicality (prototypes)– Basic-level categories

Page 18: 2002.09.03 - SLIDE 1IS 202 - Fall 2002 Lecture 03: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

2002.09.03 - SLIDE 18IS 202 - Fall 2002

Defining Category Membership

• Necessary and sufficient conditions– Every condition must be met– No other conditions can be required

• Example: A prime number:– An integer divisible only by itself and 1.

Source: Webster's Revised Unabridged Dictionary, © 1996, 1998 MICRA, Inc.

• Example: mother– A woman who has given birth to a child.

Page 19: 2002.09.03 - SLIDE 1IS 202 - Fall 2002 Lecture 03: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

2002.09.03 - SLIDE 19IS 202 - Fall 2002

Defining Category Membership

• Necessary and sufficient conditions for Mother?– mother(A,B) -> female(A), gave-birth-to(A,B),

same-species(A,B)

• What about– Birth mother vs. adoptive mother– Rearing role vs. biological role– Surrogate mother– Cloning

Page 20: 2002.09.03 - SLIDE 1IS 202 - Fall 2002 Lecture 03: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

2002.09.03 - SLIDE 20IS 202 - Fall 2002

Can Category Membership Be Defined?

• What are the necessary and sufficient conditions for something to be a game?

• Famous example by Wittgenstein– Classic categories assume clear boundaries

defined by common properties (necessary and sufficient conditions)

• How do we categorize games?

Page 21: 2002.09.03 - SLIDE 1IS 202 - Fall 2002 Lecture 03: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

2002.09.03 - SLIDE 21IS 202 - Fall 2002

Definition of Game

• Counterexample: “Game”– No common properties shared by all games

• Card games, ball games, Olympic games, children’s games

– Competition: ring-around-the-rosy– Skill: dice games– Luck: chess

– No fixed boundary to category• Can be extended to new games (e.g., video

games)

• Alternative notion of category membership– Concepts related by Family Resemblances

Page 22: 2002.09.03 - SLIDE 1IS 202 - Fall 2002 Lecture 03: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

2002.09.03 - SLIDE 22IS 202 - Fall 2002

Properties of Categorization

• Family Resemblance– Members of a category may be related to one

another without all members having any property in common

• Instead, they may share a large subset of traits• Some attributes are more likely given that others

have been seen

– Example: feathers, wings, twittering, ...• Likely to be a bird, but not all features apply to

“emu”• Unlikely to see an association with “barks”

Page 23: 2002.09.03 - SLIDE 1IS 202 - Fall 2002 Lecture 03: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

2002.09.03 - SLIDE 23IS 202 - Fall 2002

Properties of Categorization

• Example: Prime Numbers– Definition: An integer divisible only by itself and 1– Examples: 1, 2, 3, 5, 7, 11, 13, 17, …

• A very clear-cut category. Or is it?– Can one number be “more prime” than another?

• Centrality– Some members of a category may be “better

examples” than others, I.e., “prototypical” members• Example: robins vs. chickens vs. emus

Page 24: 2002.09.03 - SLIDE 1IS 202 - Fall 2002 Lecture 03: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

2002.09.03 - SLIDE 24IS 202 - Fall 2002

Properties of Categorization

• Characteristic features– Perceived degree of category membership

has to do with which features define the category

– Members usually do not have ALL the necessary features, but have some subset

– Those members that have more of the central features are seen as more central members

– People have conceptions of typical members

Page 25: 2002.09.03 - SLIDE 1IS 202 - Fall 2002 Lecture 03: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

2002.09.03 - SLIDE 25IS 202 - Fall 2002

Testing for Centrality/Typicality

• Ask a series of questions, compare how long it takes people to answer– True or false:

• An apple is a fruit• A plum is a fruit• A coconut is a fruit• An olive is a fruit• A tomato is a fruit

• Rosch and Mervis– The more features a fruit shares with the other fruits,

the more typical a member of the class it is

Page 26: 2002.09.03 - SLIDE 1IS 202 - Fall 2002 Lecture 03: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

2002.09.03 - SLIDE 26IS 202 - Fall 2002

Characteristic Features

• Is a cat on a mat a cat?

• Is a dead cat a cat?

• Is a photo of a cat a cat?

• Is a cat with three legs a cat?

• Is a cat that barks a cat?

• Is a cat with a dog’s brain a cat?

• Is a cat with every cell replaced by a dog’s cells a cat?

Page 27: 2002.09.03 - SLIDE 1IS 202 - Fall 2002 Lecture 03: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

2002.09.03 - SLIDE 27IS 202 - Fall 2002

Properties of Categorization

• Basic-level categories– Categories are organized into a hierarchy

from the most general to the most specific, but the level that is most cognitively basic is “in the middle” of the hierarchy

• Basic-level primacy– Basic-level categories are functionally primary

with respect to factors including ease of cognitive processing (learning, reasoning, recognition, etc.)

Page 28: 2002.09.03 - SLIDE 1IS 202 - Fall 2002 Lecture 03: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

2002.09.03 - SLIDE 28IS 202 - Fall 2002

Basic-Level Categories

• Brown 1958, 1965, Berlin et al., 1972, 1973• Folk biology:

– Unique beginner: plant, animal– Life form: tree, bush, flower– Generic name: pine, oak, maple, elm– Specific name: Ponderosa pine, white pine– Varietal name: Western Ponderosa pine

• No overlap between levels• Level 3 is basic

– Corresponds to genus– Folk biological categories correspond accurately to

scientific biological categories only at the basic level

Page 29: 2002.09.03 - SLIDE 1IS 202 - Fall 2002 Lecture 03: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

2002.09.03 - SLIDE 29IS 202 - Fall 2002

Basic-Level Categories

• Language– People name things more readily at basic

level– Name learned earliest in childhood– Languages have simpler names at basic level– Sounds like the “real name”– Name used more frequently

• Strange to call a dime a coin, a metal object

– Names used in neutral context• There’s a dog on the porch• There’s a terrier on the porch

Page 30: 2002.09.03 - SLIDE 1IS 202 - Fall 2002 Lecture 03: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

2002.09.03 - SLIDE 30IS 202 - Fall 2002

Basic-Level Categories

• Concepts– Things perceived more holistically at the basic

level (rather than by parts) as a gestalt (overall shape)

– People interact with basic and more specific levels similarly

– Things are remembered more readily at basic level

Page 31: 2002.09.03 - SLIDE 1IS 202 - Fall 2002 Lecture 03: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

2002.09.03 - SLIDE 31IS 202 - Fall 2002

Psychologically Primary Levels

SUPERORDINATE animal furniture

BASIC LEVEL dog chair

SUBORDINATE terrier rocker

• Children take longer to learn superordinate

• Superordinate not associated with mental images or motor actions

Page 32: 2002.09.03 - SLIDE 1IS 202 - Fall 2002 Lecture 03: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

2002.09.03 - SLIDE 32IS 202 - Fall 2002

Basic-Level Categorization

• Perception– Overall perceived shape– Single mental image– Fast identification

• Function– General motor program

• Communication– Shortest, most commonly used and contextually neutral words– First learned by children

• Knowledge Organization– Most attributes of category members stored at this level

Page 33: 2002.09.03 - SLIDE 1IS 202 - Fall 2002 Lecture 03: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

2002.09.03 - SLIDE 33IS 202 - Fall 2002

Middle-Out Categorization

• Top down– Object

• Writing implement– Pen

• Bottom up– Sanford Uniball Black Pen

• Ink Pen– Pen

• Middle out– Writing implement

• Pen– Ink Pen

Page 34: 2002.09.03 - SLIDE 1IS 202 - Fall 2002 Lecture 03: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

2002.09.03 - SLIDE 34IS 202 - Fall 2002

Summary

• Processes of categorization underlie many of the issues having to do with information organization

• Categorization is messier than our computer systems would like

• Human categories have graded membership, consisting of family resemblances– Family resemblance is expressed in part by which subset of

features is shared– It is also determined by underlying understandings of the world

that do not get represented in most systems

• Basic-level categories, as well as subordinate and superordinate categories, seem to be cognitively real and therefore important in the design of information organization and retrieval systems

Page 35: 2002.09.03 - SLIDE 1IS 202 - Fall 2002 Lecture 03: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

2002.09.03 - SLIDE 35IS 202 - Fall 2002

Next Time

• Artificial Intelligence, Ontologies, and Common Sense

Page 36: 2002.09.03 - SLIDE 1IS 202 - Fall 2002 Lecture 03: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

2002.09.03 - SLIDE 36IS 202 - Fall 2002

Homework (!)

• Read the handouts – “The Vocabulary Problem in Human-System

Communication” (G. W. Furnas, T. K. Landauer, L. M. Gomez, S. T. Dumais)

– “Commonsense-Based Interfaces” (M. Minsky)

– “CYC: A Large-Scale Investment in Knowledge Infrastructure” (D. B. Lenat)

• Assignment 2: Photo Use Scenario– Due by Thursday, September 12

Page 37: 2002.09.03 - SLIDE 1IS 202 - Fall 2002 Lecture 03: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

2002.09.03 - SLIDE 37IS 202 - Fall 2002

Photo Project Goals

• Develop an ongoing resource for SIMS (an annotated photo database) that can be used for internal research and teaching, as well for external promotional and informational purposes

• Experience the actual process of information organization and retrieval (especially as regards metadata creation and use)

• Work in small, focused teams performing a variety of tasks in image acquisition, cataloging, and application design

Page 38: 2002.09.03 - SLIDE 1IS 202 - Fall 2002 Lecture 03: Categorization Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am -

2002.09.03 - SLIDE 38IS 202 - Fall 2002

Assignment 2: Photo Use Scenario

1) Brainstorm an application idea

2) Come up with personas and scenarios

3) Write a description of your application idea involving one persona and one scenario

4) Draw a storyboard with explanatory text depicting the user experience of your application idea

5) Take photos for your application idea

6) Upload photos

7) Create your group website

8) Put Assignment 2 (persona description, scenario description, and annotated storyboard) plus a work distribution table on your group web site