74
Textual Analysis & Introduction to Python CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 1

Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Textual Analysis & Introduction to Python

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 1

Page 2: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Project Proposal Evaluations

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 2

Page 3: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Handling data-selection nicely

• FILTER(range, condition1, cond2, ...)

• Returns a filtered version of the source range, returning only rows or columns which meet the specified conditions.

• condition1: a column or row containing true/false values, or an array formula

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 3

Page 4: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Data-loading

• If you’re loading just a few columns from something with LOTS of data...

• ...the QUERY() function may be a big help.

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 4

Page 5: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Why averaging averages is bad

• In one year, UC Berkeley hired 30 faculty. 23 were male, 7 female, even though the applicant pool was 60% female.

• Sounds like bias! • [numbers made up for this example, but this

actually happened, with somewhat different numbers]

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 5

Page 6: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Which departments were biased?

• UCB looked at the data per department to see which ones hired unfairly

• NONE DID!

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 6

Page 7: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

A 3-hire example Dept 1 Dept 2 Dept 3 Total Percentage

Male 10 20 12 32 13.6% M

Female 200 1 2 203 86.4% F

Hired 1 F 1M 1M 235 67% male

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 7

• Every department’s hiring ratio was fair • The average ratio looked unfair! • Conclusion: averaging ratios can lead to

inconsistent results.

Page 8: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Today’s Class

• Intro to text analysis problems • Intro to Python

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 8

Page 9: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Get Full Credit on Project 1

• Use sheets for intermediate results, interactivity • Display your data in different ways (graph, table,

etc.) • Follow the rules: only data, parameters, labels,

and formulas. • Use “notes” on cells to explain anything out of

the ordinary • Try to avoid “hand work” and use formulas

instead! CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 9

Page 10: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Project 1 (cont.) • Remember to address your claim

– My hypothesis is correct/incorrect because … – X% of the time, my hypothesis was correct…

• Be sure to explain what obstacles you encountered • Have a “Discussion and Conclusion” section (or

something similarly named) where you reflect on things like “Is this data too unreliable for me to trust the conclusion?” and “Was a threshold of 80% really reasonable?”, and “Did eliminating countries that lacked data for any single year make the analysis worthless?”

• Look at the rubric!

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 10

Page 11: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 11

Intermediate Results

Put your raw data on its own sheet and refer to it using a formula when you do your analysis on other sheets.

Page 12: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 12

Intermediate Results

Use a new sheet when the current sheet already has a table with some meaningful data in it. Don’t lose that data.

Page 13: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 13

Interactivity

Use data validation, probably with a list (pull down), to add some interactive component. See ACT1-4 for an example using MATCH and OFFSET.

Page 14: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 14

Presentation

Try out different ways to present your data. Make a chart from your final results. If you’re not sure what to do, ask a TA or me.

Page 15: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 15

Address Your Hypothesis

Remember to relate your results back to the hypothesis. Was it true? Was it true in some cases? Why might it have been false?

Page 16: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 16

Obstacles and Reflection

What was difficult? What are the limitations of the analysis you did? You must reflect on your project and what you learned.

Page 17: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 17

Check the rubric before you submit

Page 18: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

A brief review of things you didn’t know you’d learned

• In a spreadsheet, there are many types of data

• Numbers (start with +/- or a digit) • Strings (nondigit-start, or start with ‘ ) • Formulas (start with = ) • Ranges (B2, B2:B4, B2:D5) • Errors (#N/A) • Blanks

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 18

Page 19: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

What shows up in a cell • If a formula evaluates to a number or string, that

number or string • If it evaluates to a range, the value in the first cell of

that range ...sometimes – If you write =A1:A6, you get A1 – If you write =OFFSET(A1:A6, 0, 0), Gsheets fills in adjacent

cells; excel just fills in one cell • If evaluation leads to an error, then #N/A • Mostly, we never notice any of this • In Python, the rules have greater consistency, and

because results aren’t instantly visible, knowing the rules matters more

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 19

Page 20: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Textual Analysis

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 20

Define Problem Find Data

Write a set of instructions

Solution

Computer

Page 21: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Textual Analysis

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 21

Define Problem Find Data

Write a set of instructions

Solution

Computer

ACTACGTCGACTACGATCACGATCGCGCGATCACGTATTTACGATCAGCTACGATCGATCTACGATCGTAGCTGTGATCG

Page 22: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Textual Analysis

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 22

Define Problem Find Data

Write a set of instructions

Solution

Computer

Build a Concordance of a text • Locations of words • Frequency of words

ACTACGTCGACTACGATCACGATCGCGCGATCACGTATTTACGATCAGCTACGATCGATCTACGATCGTAGCTGTGATCG

Page 23: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Concordances

Alphabetical index of all words in a text

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 23

Word Page Numbers

Apple 4,7,10,27

Banana 77,110,130

Carrot 50,101

Date 9

… …

Page 24: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Concordances

• Before computers, was a huge pain. • What texts might have had concordances?

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 24

http://en.wikipedia.org/wiki/Concordance_(publishing)

Page 25: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Concordances

• Before computers, was a huge pain. • What texts might have had concordances?

– The Bible – The Quran – The Vedas – Shakespeare

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 25

http://en.wikipedia.org/wiki/Concordance_(publishing)

Page 26: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Concordances

• Before computers, was a huge pain. • What texts might have had concordances?

– The Bible – The Quran – The Vedas – Shakespeare

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 26

Not a “New” Problem: First Bible Concordance

completed in 1230

http://en.wikipedia.org/wiki/Concordance_(publishing)

Page 27: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Concordances

• How long would the King James Bible take us? – 783,137 words

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 27

http://agards-bible-timeline.com/q10_bible-facts.html

Page 28: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Concordances

• How long would the King James Bible take us? – 783,137 words

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 28

http://agards-bible-timeline.com/q10_bible-facts.html

800,000 * (3 min. to look up word and put page #) = 2,400,000 minutes = 40,000 hours

= 1,667 days = 4.5 years

Page 29: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Concordances

• How long would the King James Bible take us? – 783,137 words

Takes 70 hours to read the King James Bible aloud

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 29

http://agards-bible-timeline.com/q10_bible-facts.html

800,000 * (3 min. to look up word and put page #) = 2,400,000 minutes = 40,000 hours

= 1,667 days = 4.5 years

Page 30: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Strong’s Concordance

• Concordance of the King James Bible • Published in 1890 by James Strong

Wikipedia

http://www.christianbook.com/reader/?item_no=563788 CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 30

Page 31: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

From Concordance to Word Frequency

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 31

Suppose our text has 1000 words total.

Word Page Numbers

# of Occurrences

Word Frequency

Apple 4,7,10,27 4 4/1000

Banana 77,110,130 3 3/1000

Carrot 50,101 2 2/1000

Date 9 1 1/1000

… … … …

Page 32: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Google Ngrams

• Google “Google n-grams” • ngram: a set of n words

– “hello” is a 1-gram – “hello there” is a 2-gram

• Click on “About Google Books Ngram Viewer” for more information

• Question: what is the data source here? CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 32

Page 33: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Textual Analysis

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 33

Define Problem Find Data

Write a set of instructions

Solution

Computer

Build a Concordance of a text • Locations of words • Frequency of words • Word frequencies across time

ACTACGTCGACTACGATCACGATCGCGCGATCACGTATTTACGATCAGCTACGATCGATCTACGATCGTAGCTGTGATCG

Page 34: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

The Wizard of OZ

• About 40 Books, written by 7 different authors

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 34

Lyman Frank Baum Ruth Plumly Thompson

http://www.ssc.wisc.edu/~zzeng/soc357/OZ.pdf

#1 #14 #15 #16 #33

… …

Page 35: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

The Wizard of OZ

• About 40 Books, written by 7 different authors

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 35

Lyman Frank Baum (1856-1919)

Ruth Plumly Thompson

http://www.ssc.wisc.edu/~zzeng/soc357/OZ.pdf

#1 #14 #15 #16 #33

… …

Published in 1921

Page 36: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

The Wizard of OZ

• About 40 Books, written by 7 different authors

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 36

Lyman Frank Baum (1856-1919)

Ruth Plumly Thompson

http://www.ssc.wisc.edu/~zzeng/soc357/OZ.pdf

#1 #14 #15 #16 #33

… …

Published in 1921

Page 37: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

The Federalist Papers

• 85 articles written in 1787 to promote the ratification of the US Constitution

• In 1944, Douglass Adair guessed authorship – Alexander Hamilton (51) – James Madison (26) – John Jay (5) – 3 were a collaboration

• Corroborated in 1964 by a computer analysis

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 37

Wikipedia

http://pages.cs.wisc.edu/~gfung/federalist.pdf

Page 38: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Textual Analysis

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 38

Define Problem Find Data

Write a set of instructions

Solution

Computer

Build a Concordance of a text • Locations of words • Frequency of words • Word frequencies across time

• Determine authorship ACTACGTCGACTACGATCACGATCGCGCGATCACGTATTTACGATCAGCTACGATCGATCTACGATCGTAGCTGTGATCG

Page 39: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Stoppard’s Arcadia Bernard: Yes. One of my colleagues believed he had found an unattributed short story by D. H. Lawrence, and he analysed it on his home computer, most interesting, perhaps you remember the paper? Valentine: Not really. But I often sit with my eyes closed and it doesn't necessarily mean I'm awake. Bernard: Well, by comparing sentence structures and so forth, this chap showed that there was a ninety per cent chance that the story had indeed been written by the same person as Women in Love. To my inexpressible joy, one of your maths mob was able to show that on the same statistical basis there was a ninety per cent chance that Lawrence also wrote the Just William books and much of the previous day's Brighton and Hove Argus.

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 39

Page 40: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Textual Analysis

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 40

Define Problem Find Data

Write a set of instructions

Solution

Computer

Build a Concordance of a text • Locations of words • Frequency of words • Word frequencies across time

• Determine authorship • Count labels to determine

liberal media bias

ACTACGTCGACTACGATCACGATCGCGCGATCACGTATTTACGATCAGCTACGATCGATCTACGATCGTAGCTGTGATCG

Page 41: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

How are we going to analyze texts?

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 41

firehow.com

Excel

Numerical Data

Page 42: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

How are we going to analyze texts?

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 42

firehow.com

Excel

Numerical Data

Textual Data

Page 43: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

How are we going to analyze texts?

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 43

Makita Cordless Chain Saw, $270

Textual Data

Page 44: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

How are we going to analyze texts?

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 44

Textual Data

9poundhammer.blogspot.com

Python: A Programming Language Free!

Page 45: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Textual Analysis

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 45

Define Problem Find Data

Write a set of instructions

Solution

Python

Build a Concordance of a text • Locations of words • Frequency of words • Word frequencies across time

• Determine authorship • Count labels to determine

liberal media bias

ACTACGTCGACTACGATCACGATCGCGCGATCACGTATTTACGATCAGCTACGATCGATCTACGATCGTAGCTGTGATCG

Page 46: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Break

• Everyone: Open IDLE (Python GUI)

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 46

Page 47: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

“Python”

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 47

Page 48: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

“Python”

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 48

• A language for giving the computer instructions. It has syntax and semantics.

Page 49: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

“Python”

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 49

• A language for giving the computer instructions. It has syntax and semantics.

• Might say “write a Python program”, meaning “write instructions in the Python language”

Page 50: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

“Python”

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 50

• A language for giving the computer instructions. It has syntax and semantics.

• Might say “write a Python program”, meaning “write instructions in the Python language”

• There is an interpreter (e.g., IDLE) that takes Python instructions and executes them with the CPU, etc.

Page 51: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Introduction to Python • Expressions are inputs that Python evaluates

– Expressions return an output – Like using a calculator

Type the expressions below after ‘>>>’ and hit Enter

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 51

1. Expressions 2. Assignments

a) Variables 3. Types

a) Integers b) Floats c) Strings d) Lists

Page 52: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Introduction to Python • Expressions are inputs that Python evaluates

– Expressions return an output – Like using a calculator

Type the expressions below after ‘>>>’ and hit Enter

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 52

1. Expressions 2. Assignments

a) Variables 3. Types

a) Integers b) Floats c) Strings d) Lists

>>> 4+2 6 >>> 4-2 2 >>> 4*2 8 >>> 4/2 2

Page 53: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Introduction to Python

• Assignments do not have an output, they are stored in memory.

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 53

1. Expressions 2. Assignments

a) Variables 3. Types

a) Integers b) Floats c) Strings d) Lists

Page 54: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Introduction to Python

• Assignments do not have an output, they are stored in memory. – We’ve done this kind of thing in Excel

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 54

1. Expressions 2. Assignments

a) Variables 3. Types

a) Integers b) Floats c) Strings d) Lists

We have assigned the number 1 to cell A1.

Page 55: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Introduction to Python

• Assignments do not have an output, they are stored in memory. – We’ve done this kind of thing in Spreadsheets

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 55

1. Expressions 2. Assignments

a) Variables 3. Types

a) Integers b) Floats c) Strings d) Lists

We have assigned the number 1 to cell A1.

Let’s rename cell A1 to x.

Page 56: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Introduction to Python

• Assignments do not have an output, they are stored in memory. – We’ve done this kind of thing in Spreadsheets

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 56

1. Expressions 2. Assignments

a) Variables 3. Types

a) Integers b) Floats c) Strings d) Lists

Let’s rename cell A1 to x.

>>> x = 1

Page 57: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Introduction to Python

• Assignments do not have an output, they are stored in memory. – We’ve done this kind of thing in Spreadsheets

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 57

1. Expressions 2. Assignments

a) Variables 3. Types

a) Integers b) Floats c) Strings d) Lists

>>> x = 1

variable expression =

Memory

Variable Name Value

x 1

Page 58: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Introduction to Python

• Assignments do not have an output, they are stored in memory. – We’ve done this kind of thing in Spreadsheets

– We can now use x in expressions!

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 58

1. Expressions 2. Assignments

a) Variables 3. Types

a) Integers b) Floats c) Strings d) Lists

>>> x = 1

Memory

Variable Name Value

x 1

>>> x+1 2 >>> (x+2)*3 9

Page 59: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Introduction to Python

• You can name your variables anything

• Well, almost anything – No spaces, operators, punctuation,

number in the first position • Variables usually start with a

lowercase letter and, if useful, describe something about the value.

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 59

1. Expressions 2. Assignments

a) Variables 3. Types

a) Integers b) Floats c) Strings d) Lists

>>> steve = 100 >>> myNumber = 12345 >>> noninteger = 4.75

Page 60: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Choices

• Why are those the rules for names? • Someone thought about it and made a choice • Usually based on years of experience • Many choices seem crazy...

– Until one day you see they’re obviously correct

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 60

Page 61: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Introduction to Python

• Try this:

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 61

1. Expressions 2. Assignments

a) Variables 3. Types

a) Integers b) Floats c) Strings d) Lists

>>> 3/2

Page 62: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Introduction to Python

• Try this: • There are two types of numbers in Python.

The type()function is useful.

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 62

1. Expressions 2. Assignments

a) Variables 3. Types

a) Integers b) Floats c) Strings d) Lists

>>> 3/2

>>> type(3/2) <type 'int'> >>> type(1.5) <type 'float'>

Page 63: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Introduction to Python

• Try this: • There are two types of numbers in Python.

The type()function is useful.

• Floats are numbers that

display with decimal points.

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 63

1. Expressions 2. Assignments

a) Variables 3. Types

a) Integers b) Floats c) Strings d) Lists

>>> 3/2

>>> type(3/2) <type 'int'> >>> type(1.5) <type 'float'>

>>> 3.0/2.0 1.5

Page 64: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Introduction to Python

• Try this: • There are two types of numbers in Python.

The type()function is useful.

• Floats are decimals.

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 64

1. Expressions 2. Assignments

a) Variables 3. Types

a) Integers b) Floats c) Strings d) Lists

>>> 3/2

>>> type(3/2) <type 'int'> >>> type(1.5) <type 'float'>

>>> 3.0/2.0 1.5

General Rule: Expressions for a particular type will output that same type!

Page 65: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Introduction to Python

• Strings are sequences of characters, surrounded by single quotes.

• The + operator concatenates

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 65

1. Expressions 2. Assignments

a) Variables 3. Types

a) Integers b) Floats c) Strings d) Lists

>>> 'hi' 'hi' >>> myString = 'hi there' >>> myString 'hi there'

Page 66: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Introduction to Python

• Strings are sequences of characters, surrounded by single quotes.

• The + operator concatenates

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 66

1. Expressions 2. Assignments

a) Variables 3. Types

a) Integers b) Floats c) Strings d) Lists

>>> 'hi' 'hi' >>> myString = 'hi there' >>> myString 'hi there'

General Rule: Expressions for a particular type will output that same type!

Page 67: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Introduction to Python

• Strings are sequences of characters, surrounded by single quotes.

• The + operator concatenates

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 67

1. Expressions 2. Assignments

a) Variables 3. Types

a) Integers b) Floats c) Strings d) Lists

>>> 'hi' 'hi' >>> myString = 'hi there' >>> myString 'hi there'

>>> endString = ' class!' >>> myString + endString 'hi there class!' >>> newString = myString + endString >>> newString 'hi there class!'

Page 68: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Introduction to Python

• Lists are an ordered collection of items

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 68

1. Expressions 2. Assignments

a) Variables 3. Types

a) Integers b) Floats c) Strings d) Lists

>>> [5,10,15] [5, 10, 15] >>> myList = [5,10,15] >>> myList [5, 10, 15] >>> stringList = ['hi','there','class'] >>> stringList ['hi', 'there', 'class']

Page 69: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Introduction to Python • Lists are an ordered

collection of items

• Individual items are elements • The + operator concatenates

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 69

1. Expressions 2. Assignments

a) Variables 3. Types

a) Integers b) Floats c) Strings d) Lists

>>> [5,10,15] [5, 10, 15] >>> myList = [5,10,15] >>> myList [5, 10, 15] >>> stringList = ['hi','there','class'] >>> stringList ['hi', 'there', 'class']

>>> myList + stringList [5, 10, 15, 'hi', 'there', 'class']

Page 70: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Introduction to Python

• To get an element from a list, use the expression where i is the index. Often spoken: “myList sub i”

• List indices start at 0!

• What does do?

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 70

1. Expressions 2. Assignments

a) Variables 3. Types

a) Integers b) Floats c) Strings d) Lists

>>> myList[i]

>>> myList[0] 5 >>> myList[1] 10 >>> myList[2] 15

>>> myList[1] = 4

Page 71: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Introduction to Python

• To get a range of elements from a list, use the expression where i is the start index (inclusive) and j is the end index (exclusive).

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 71

1. Expressions 2. Assignments

a) Variables 3. Types

a) Integers b) Floats c) Strings d) Lists

>>> myList[i:j]

>>> myList [5, 4, 15] >>> myList[0:2] [5, 4] >>> myList[1:3] [4, 15] >>> newList = [2,5,29,1,9,59,3] >>> newList [2, 5, 29, 1, 9, 59, 3] >>> newList[2:6] [29, 1, 9, 59]

Page 72: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Introduction to Python

• Indexing and ranges also work on Strings.

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 72

1. Expressions 2. Assignments

a) Variables 3. Types

a) Integers b) Floats c) Strings d) Lists

>>> myString 'hi there' >>> myString[0] 'h' >>> myString[5] 'e' >>> myString[6] 'r' >>> myString[0:6] 'hi the'

Page 73: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Introduction to Python

• Remember what assignments do

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 73

1. Expressions 2. Assignments

a) Variables 3. Types

a) Integers b) Floats c) Strings d) Lists

Memory

Variable Name Value x 1

steve 100

myNumber 12345

noninteger 4.75

myString 'hi there'

endString ' class!'

myList [5,4,15]

stringList ['hi','there','class']

newList [2,5,29,1,9,59,3]

Page 74: Textual Analysis & Introduction to Pythoncs.brown.edu/courses/csci0931/2014-fall/2-text_analysis/LEC2-1.pdf · Textual Analysis & Introduction to Python CSCI 0931 ... • FILTER(range,

Class Review

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 74

Python So Far (to be updated/refined!) 1. Expressions

• Evaluate input and returns some output (calculator) 2. Variable Assignments: <variable> = <expression>

• Store the value of the expression in the variable instead of outputting the value.

• There is always an equals sign in an assignment • Variables can be named many things • List assignments: <listvar>[<index>] = <expression>

3. Types • Integers vs. Floats (Decimals) • Strings in single quotes • Lists are sets of other types • We can index into Strings & Lists

• Indexed starting at 0!

General Rule: Expressions for a particular type will output that same type!