31
1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors Columbia University Funded by the Andrew W. Mellon Foundation 2002-2004

Project CLiMB C omputational Li nguistics for M etadata B uilding

  • Upload
    dot

  • View
    25

  • Download
    0

Embed Size (px)

DESCRIPTION

Project CLiMB C omputational Li nguistics for M etadata B uilding. Columbia University Funded by the Andrew W. Mellon Foundation 2002-2004. Using Computational Linguistic Techniques to Harvest Image Descriptors. Photograph courtesy of the Council of Industrial Design's Design Archive. - PowerPoint PPT Presentation

Citation preview

Page 1: Project CLiMB C omputational  Li nguistics for  M etadata  B uilding

1

Project CLiMB

Computational Linguistics for

Metadata Building

Using Computational Linguistic Techniques

to Harvest Image Descriptors

Columbia UniversityFunded by the Andrew W. Mellon Foundation

2002-2004

Page 2: Project CLiMB C omputational  Li nguistics for  M etadata  B uilding

2Photograph courtesy of the Council of Industrial Design's Design Archive.

Page 3: Project CLiMB C omputational  Li nguistics for  M etadata  B uilding

3

Page 4: Project CLiMB C omputational  Li nguistics for  M etadata  B uilding

4

CLiMB: Interdisciplinary Research at

Columbia University

• Libraries• Computer Science Department• Center for Research on Information Access

(CRIA)

Funded by the Andrew W. Mellon Foundation2002-2004

Page 5: Project CLiMB C omputational  Li nguistics for  M etadata  B uilding

5

CLiMB Project Members

Judith Klavans, PI

Stephen Davis

Angela Giral

Patricia Renfro

Bob Wolven

Roberta Blitz

Rebecca Passonneau

Veronika Horvath

David Elson

Page 6: Project CLiMB C omputational  Li nguistics for  M etadata  B uilding

6

Problems in Image Access

Traditional approach: labor intensiveexpensive

Page 7: Project CLiMB C omputational  Li nguistics for  M etadata  B uilding

7

Project CLiMB

Help image catalogers provide subject access?

Harvest image descriptors

from existing literature?

Page 8: Project CLiMB C omputational  Li nguistics for  M etadata  B uilding

8

Can we harvest image descriptors?

Page 9: Project CLiMB C omputational  Li nguistics for  M etadata  B uilding

9

CLiMB will identify and extract• proper nouns• terms and phrases

from text related to an image:

By September 14, 1908, the basis of the Greenes' final design had been worked out. It featured a radically informal, V-shaped plan (that maintained the original angled porch) and interior volumes of various heights, all under a constantly changing roofline that echoed the rise and fall of the mountains behind it. The chimneys and foundation would be constructed of the sandstone boulders that comprised the local geology, and the exterior of the house would be sheathed in stained split-redwood shakes.

— Edward R. Bosley. Greene & Greene. London: Phaidon, 2000. p.127.

CLiMB Technical Contribution

Page 10: Project CLiMB C omputational  Li nguistics for  M etadata  B uilding

10

CLiMB Overall Goals

The essence of CLiMB: • Use scholars themselves as “catalogers” by

employing scholarly publications• Enhance existing descriptive metadata

The CLiMB project:• Research: Development of richer retrieval

through increased numbers of descriptors• Practice: Development of CLiMB ToolKit

Page 11: Project CLiMB C omputational  Li nguistics for  M etadata  B uilding

11

• Image collection

• Associated text

• Target object identification (TOI)

• CLiMB ToolKit

Squeezing Metadata out of Scholarly Texts

Page 12: Project CLiMB C omputational  Li nguistics for  M etadata  B uilding

12

Greene & Greene Architectural Records and Papers Collection

Drawings and ArchivesAvery Architectural and Fine Arts LibraryColumbia University Libraries

Page 13: Project CLiMB C omputational  Li nguistics for  M etadata  B uilding

13

NYDA.1960.001.00023

All Saints Episcopal Church (Pasadena, Calif.). Alterations1902-1903

Page 14: Project CLiMB C omputational  Li nguistics for  M etadata  B uilding

14

Greene & Greene Catalog RecordAuthor: Greene & Greene.Title: [Mrs. Dudley P. Allen house, 1188 Hillcrest Avenue (Pasadena,

Calif.). Alterations.]Residence of Mrs. Dudley P. Allen, 1188 Hillcrest Ave., Pasadena,

Cal. [graphic] : Alteration / Greene & Greene, Architects. Published: [1917]

Physical Details: 4 sheets : various media ; 87.8 x 57.3 cm. (34 5/8 x 22 5/8 in.)Location: Columbia University, Avery Architectural Drawings

Other Authors: Greene, Charles Sumner, 1868-1957. Greene, Henry Mather, 1870-1954.

Subjects: HousesAlterationsArchitecture--Designs and plans--United States.Mrs. Dudley P. Allen house, 1188 Hillcrest Avenue (Pasadena,

Calif.)

Component Item: [1] Item no. NYDA.1960.001.03224. [AVERYimage]. Electric lighting -- floor plan, part plan of basement : Sheet no.

Component Item: [2] Item no. NYDA.1960.001.00073. [AVERYimage]. [Electric lighting] floor plan, part plan of basement.

Page 15: Project CLiMB C omputational  Li nguistics for  M etadata  B uilding

15

• Bosley, Edward R. Greene & Greene. London : Phaidon, 2000.

• Current, William R. Greene & Greene: architects in the residential style. Fort Worth [Tex.] : Amon Carter Museum of Western Art, [1974]

• Makinson, Randell L. Greene & Greene: architecture as fine art. Salt Lake City : Peregrine Smith, c1977.

• Makinson, Randell L. Greene & Greene: the passion and the legacy. Salt Lake City : Gibbs and Smith, c1998.

• Smith, Bruce. Greene & Greene masterworks. San Francisco : Chronicle Books, c1998.

• Strand, Janann. A Greene & Greene guide [Pasadena, Calif. : G. Dahlstrom, 1974]

Greene & Greene Bibliography(associated texts)

Page 16: Project CLiMB C omputational  Li nguistics for  M etadata  B uilding

16

Page 17: Project CLiMB C omputational  Li nguistics for  M etadata  B uilding

17

• Image collection

• Associated text

• Target object identification (TOI)

• CLiMB ToolKit

Squeezing Metadata out of Scholarly Texts

Page 18: Project CLiMB C omputational  Li nguistics for  M etadata  B uilding

18

Target Object Identification (TOI)

• “Authority” list

• Varies from collection to collection– Greene & Greene – Project Names– North Carolina Museum – Creator/Title

Page 19: Project CLiMB C omputational  Li nguistics for  M etadata  B uilding

19

Page 20: Project CLiMB C omputational  Li nguistics for  M etadata  B uilding

20

North Carolina Museum of Art Museum Catalog

(Associated Text)

Images

(Catalog Records)

North Carolina Museum of Art: Handbook of the Collections. Ed. Rebecca Martin Nagy. Raleigh, NC: North Carolina Museum of Art, Hudson Hills Press, 1998.

Page 21: Project CLiMB C omputational  Li nguistics for  M etadata  B uilding

21

Georgia O'Keeffe (American, 1887-1986)

Cebolla Church, 1945

Oil on canvas, 20 1/16 x 36 1/4 in. (51.1 x 92.0 cm.) Purchased with funds from the North Carolina Art Society (Robert F. Phifer Bequest), in honor of Joseph C. Sloane, 72.18.1

North Carolina Museum of Art<http://ncartmuseum.org/collections/highlights/20thcentury/20th/1910-

1950/038_lrg.shtml>

Page 22: Project CLiMB C omputational  Li nguistics for  M etadata  B uilding

22

MARC format

100 O’Keeffe, Georgia, ≠d 1887 -1986. 245 Cebolla church ≠ h [slide] / ≠ c Georgia

O’Keeffe.260 ≠c2003300 1 slide : ≠ b col.500 Object date: 1945.500 Oil on canvas.500 20 x 36 in.535 North Carolina Museum of Art ≠ b Raleigh, N.C.650 Painting, American ≠ y 20th century.650 Women artist ≠ z United States 650 Church buildings in art.

Page 23: Project CLiMB C omputational  Li nguistics for  M etadata  B uilding

23

Cebolla Church, 1945Oil on canvas, 20 1/16 x 36 1/4 in. (51.1 x 92.0 cm.)Purchased with funds from the North Carolina Art Society (Robert F. Phifer Bequest), in honor of Joseph C. Sloane, 72.18.1

Driving through the New Mexican highlands near her home, Georgia O'Keeffe would often pass through the village of Cebolla with its rude adobe Church of Santo Niño. The artist was moved by the poignancy of the little building: its sagging, sun-bleached walls and rusted tin roof seemed so typical of the difficult life of the people.

When O'Keeffe came to paint the church she addressed it directly, emphasizing its isolation and stark simplicity. Literally formed out of the earth, the building affirms the permanence and the hard, defiant patience of the people. For O’Keeffe, it symbolized human endurance and aspiration. "I have always thought it one of my very good pictures", she wrote, "though its message is not as pleasant as many others".

And the question remains: What is that in the window?

Page 24: Project CLiMB C omputational  Li nguistics for  M etadata  B uilding

24

MARC format with CLiMB subject terms100 O’Keeffe, Georgia, ≠d 1887 -1986. 245 Cebolla church ≠ h [slide] / ≠ c Georgia O’Keeffe.260 ≠c2003300 1 slide : ≠ b col.500 Object date: 1945.500 Oil on canvas.500 20 x 36 in.535 North Carolina Museum of Art ≠ b Raleigh, N.C.650 Painting, American ≠ y 20th century.650 Women artist ≠ z United States 650 Church buildings in art.

CLiMB New Mexican highlands CLiMB village of Cebolla CLiMB adobe Church of Santo NiñoCLiMB sagging, sun-bleached walls CLiMB rusted tin roofCLiMB isolationCLiMB human enduranceCLiMB window

Page 25: Project CLiMB C omputational  Li nguistics for  M etadata  B uilding

25

• Image collection

• Associated text

• Target object identification (TOI)

• CLiMB ToolKit

Squeezing Metadata out of Scholarly Texts

Page 26: Project CLiMB C omputational  Li nguistics for  M etadata  B uilding

26

The CLiMB ToolKit

• Software prototype• For large image collections• Semi-automated metadata

– Subject access terms– Human intervention at all steps

• Iterative development cycle

Page 27: Project CLiMB C omputational  Li nguistics for  M etadata  B uilding

27

Page 28: Project CLiMB C omputational  Li nguistics for  M etadata  B uilding

28

The CLiMB ToolKit

• Web Browser

• Help Menus

• Projects

A Graphical User Interface (GUI)

Page 29: Project CLiMB C omputational  Li nguistics for  M etadata  B uilding

29

CLiMB TOOLKIT: Process Flow

1. Load Text

2. Load TOI List

3. Analyze Text

5. Review

4. Select Subject Access Terms

Page 31: Project CLiMB C omputational  Li nguistics for  M etadata  B uilding

31

Thank you!

Any further questions?

www.columbia.edu/cu/cria/climb