58
Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

Embed Size (px)

Citation preview

Page 1: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

Introduction to CHILDES and TalkBank

Brian MacWhinney

CMU - Psychology, Modern Languages, Language Technologies Institute

Page 2: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

The goal of TalkBank

Page 3: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

The core idea

Human communication is a single unified process.

However, patterns in communication are analyzed by 20 different fields.

The time scales of the processes varies from milliseconds to centuries.

But all of these processes must have their ultimate effect in the Moment.

We can capture the Moment on video.

Page 4: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

Principles

Data-sharing, Informed Consent

Multimedia

Open Access, Web Access, Commentary

Specified Format

Interoperability

Community integration

Page 5: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

Availability

http://childes.psy.cmu.edu

http://talkbank.org

programs, manuals, fonts, morphologies, CA conventions, video production guides, XML Schema, links to other programs

data can be either downloaded or played back over the web

Page 6: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

Current target areas

1. CHILDES

2. PhonBank

3. BilingualBank

4. AphasiaBank

5. CABank

6. ClassBank

Page 7: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

CHILDES

Child Language Data Exchange System

Founded in 1984 in Concord MA

Director: Brian MacWhinney [email protected]

Programmers: Leonid Spektor, Franklin Chen

3000 Members

130 corpora

Over 3200 published articles

Page 8: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

CHILDES and TalkBankCHILDES TalkBank

Age 23 years 7 years

Words 44 million 8 + 55 million

Media 750 GB 450GB

Languages 32 18

Publications 3200+ 89

Users 3000+ 500

Page 9: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

Practical Considerations

Learning CLAN takes about a week

Transcription is slow. Perhaps 15:1 ratio. Blitzscribe, LENA, etc. probably will not work

Currently available data may not be perfect for a given issue

Corpora may need enhancement through MOR or Coder’s editor

9

Page 10: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

Tools from the Web

Data: childes.psy.cmu.edu/data

CLAN: childes.psy.cmu.edu/clan

Manuals: childes.psy.cmu.edu/manuals

Morphosyntax: childes.psy.cmu.edu/morgrams

Phon childes.psy.cmu.edu/phon

Tutorial videos talkbank.org/training

Digital video: talkbank.org/dv

CA Methods: talkbank.org/CABank

Page 11: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

11

Why no handout?

“Overviews” link has this PPT presentation

CHILDES is now fully electronic. No more paper.

Page 12: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

Available Methods

Microanalysis - CA, phonetics, ethology

Microgenetic analysis - CA, code-switching (NEXT)

Group and treatment comparisons - Genesee

Error analysis - YipMatthews

Diffusion analysis - in preschools

Longitudinal studies - growth curves

Modeling - neural nets, dynamic systems, evolutionary models

Page 13: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

CLAN Tools

Transcribing

Editing

Counts -- FREQ, KWAL

Analyses: MOR, GRASP, PHON

• Interoperability -- ELAN, Praat, SFS, EXMARaLDA, CLAPI, PHON

Page 14: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

CA marks

inUnicod

e

Page 15: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

Transcripts linked to media

Page 16: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

16

Ground Rules

• Ethical use, informed consent

• Levels of permission

• Respect for dignity of participants

• Respect for contributors

• Requirement to cite sources

• Requirement to contribute data

Page 17: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

17

Info-CHILDES and Membership

[email protected]

• Archived at LinguistList

• Info-CHIBolts for nuts and bolts

• Membership list

• IASCL Membership

Page 18: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

18

Getting Set Up

• Download CLAN from Programs link

Page 19: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

19

Windows issues

• You can work in c:\childes

• But your administrator may have this locked, so, you may need shortcuts.

• Windows IPA is difficult.

• Windows compression may produce .wmf

Page 20: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

20

Downloading Manuals

CHAT, CLAN

Page 21: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

21

Getting Started

• Open CLAN Manual to Chapter 2

• Double-click application

• Control-D to open Commands Window

• Set Working Directory to

c:\childes\clan\lib\samples

Page 22: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

22

Should look like this:

Windows will be c:\childes\clan\lib\samples

Page 23: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

23

Run FREQ

• Freq sample.cha

• Hit RUN or carriage return

• In output, does “want” occur 3 times?

Page 24: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

24

Interface Features

• Help

• CLAN

• Files In

• Recall

• Set MOR, Lib, Output directories

Page 25: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

25

Files In

Page 26: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

26

Building Commands

• mlu +t*CHI +f sample.cha

• mlu *.cha

• Wildcards

• File output

• *.cha

Page 27: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

27

Changing Directories

• Set Working to: ne32

combo +t*MOT +s"is^*ing" *.cha

• Set Working to: samples

kwal +sbunny +w2 -w2 0042.cha

• Triple click on output line to go back to source file

Page 28: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

28

GEM

• Set Working to: Workshop

• GEM +s* pau001.cha

• Open output, play audio

Page 29: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

29

Exercises - Chapter 8

• MLU50 – mlu +t*CHI +z50u +f *.cha

• MLU5 – maxwd +t*CHI +g1 +c5 +dl 68.cha | mlu >

68.ml5.cex

• TTR– freq +t*CHI +s"*-%%" +f *.cha

Page 30: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

30

BatchFile• maxwd +t*CHI +g1 +c5 +dl 14.cha | mlu > 14.ml5.cex

• maxwd +t*CHI +g1 +c5 +dl 55.cha | mlu > 55.ml5.cex

• maxwd +t*CHI +g1 +c5 +dl 66.cha | mlu > 66.ml5.cex

• maxwd +t*CHI +g1 +c5 +dl 68.cha | mlu > 68.ml5.cex

• maxwd +t*CHI +g1 +c5 +dl 98.cha | mlu > 98.ml5.cex

• Batch batch.cex

• Or just run by highlighting in Commands (Windows)

Page 31: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

31

Tables

Child MLU50

MLU5 TTR MLT Ratio

14 0.10 0.12 1.84 -0.90

55 -0.70 -0.65 -0.15 -0.94

66 -0.25 -0.19 -0.68 -1.14

68 3.10 2.56 -0.67 1.60

98 -0.95 -1.11 -0.55 0.31

Page 32: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

32

The Editor

Page 33: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

33

Playing a linked file

• Esc-8

• Esc-A

• Cont-Click

• F5

Page 34: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

34

Linking a File - F5

• Cursor on *FAT

• Find file

• F5

• Press space for each utterance

• Save

Page 35: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

35

F5 Tricks

• Go back to last good link

• Space quickly through contained overlap

• If a bullet is missing, cut and paste an old one

• For precision, try Sonic Mode

Page 36: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

36

Sonic Mode

• Esc-0 to start

• Highlight area

• Shift-click to move edge

• Have cursor on line in file

• S to insert time marks

• Triple click a linked sentence

Page 37: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

37

Transcribing

• Open new window (Command-N)

• Insert headers – @Begin

– @Languages: en

– @Participants: CHI Target_Child, MOT Mother, FAT Father, ROS Brother

– @Date

• F5 with space at each utterance

• Go back and transcribe each bullet (c-click)

• Adjust time marks using Esc-A

Page 38: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

38

F5, locate sound, enter bullets

click on bullets, transcribe

Page 39: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

39

Or use SoundWalker

Page 40: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

40

Or use the Video Editor

Page 41: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

41

CHECK

• CHECK is CRUCIAL

• Internal: Esc-L

• External: check *.cha

• External CHECK provides fuller control

Page 42: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

42

Options

• Backup

• Wrapping

• Line Numbers

• CHECK

Page 43: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

43

More Options

Line numbers F5 bullets SoundAnalyzer

Page 44: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

44

Coder's Editor

• Open barry.cha

• Esc-0

• Cursor on first line

• Open codeshar.cut

• %spa

• Insert $NIA:AC:IN

Page 45: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

45

Coder's Editor Commands

• F1 finish current tier and go to the next

• Esc-c finish coding current tier

• Esc-t restrict coding to a particular speaker

• Esc-Esc go on to the next speaker

• Esc-s rotate subcodes

• Control-g cancel illegal command

Page 46: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

46

Send to Praat

Open Praat, Click before link, Send to Praat, Run Analysis

Page 47: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

47

Learning to Digitize

Page 48: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

48

Searching, Replacing

• Cont-R, Cont-F

• Space, No, !, control-G

Page 49: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

49

Fixing Things

• CHSTRING

• INSERT (inserts @ID headers)

• FIXIT

• LONGTIER

• FIXBULLETS

• REN

• COMBTIER

Page 50: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

50

Tour of English MOR Files

• Download a copy

• A-rules

• C-rules

• Sf.cut

• Lexicon

Page 51: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

51

Running MOR

• Set MOR directory

• mor +xi (dogs)

• mor +xl barry.cha

• Open barry.ulx.cex

• Fix problems using KWAL

• mor *.cha

Page 52: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

52

POST

• mor barry.cha +1 or else

• mor barry.cha and then

• ren *.mor.cex *.cha +f

• post *.cha +1

Page 53: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

53

Fixing POST• POST is 95% accurate, but some projects

need 100% accuracy

• Eve training set may need error checking

• More data will train a better POST

• POST training is mostly about bootstrapping, using regexp to find and correct subcases leading to error

• Need to remove some POS possibilities and add them back through post-POST rules (spell as N)

Page 54: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

54

CHAT

• What is an utterance?

• What is a word?

• Tour of the CHAT manual

Page 55: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

55

Web Browsing of Video

Page 56: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

56

Some examples

• Forrester

• Rollins

• Yasmin

• Paulo

• Brent, MacWhinney

• Classroom - JLS

Page 57: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

57

Rollins Coding

Page 58: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

Conclusions

• CHILDES and TalkBank provide solid tools for studying language learning and functioning

• Data-sharing has led to major advances in the field

• New approaches emphasize the use of multimedia analysis, computational linguistics, and speech technology

58