21
T EXT MINING I NTRO TO P YTHON Mattias Villani Statistics Dept. of Computer and Information Science Linköping University MATTIAS VILLANI (STATISTICS,LIU) TEXT MINING 1 / 21

TEXT MINING INTRO TO PYTHON - IDA - Link¶ping University

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: TEXT MINING INTRO TO PYTHON - IDA - Link¶ping University

TEXT MINING

INTRO TO PYTHON

Mattias Villani

StatisticsDept. of Computer and Information Science

Linköping University

MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 1 / 21

Page 2: TEXT MINING INTRO TO PYTHON - IDA - Link¶ping University

OVERVIEW

I What is Python? How is it special?I Python’s objectsI If-else, loops and list comprehensionsI FunctionsI ClassesI Modules

MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 2 / 21

Page 3: TEXT MINING INTRO TO PYTHON - IDA - Link¶ping University

WHAT IS PYTHON?

I First version in 1991I High-level languageI Emphasizes readabilityI Interpreted (bytecode .py and .pyc) [can be compiled via C/Java]I Automatic memory managementI Strongly dynamically typedI Functional and/or object-orientedI Glue to other programs (interface to C/C++ or Java etc)

MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 3 / 21

Page 4: TEXT MINING INTRO TO PYTHON - IDA - Link¶ping University

THE BENEVOLENT DICTATOR FOR LIFE (BDFL)GIUDO VAN ROSSUM

MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 4 / 21

Page 5: TEXT MINING INTRO TO PYTHON - IDA - Link¶ping University

PYTHON PECULIARITES (COMPARED TO R/MATLAB)

I Counting begins at 0.I myVector[0:2] returns the first and second element, but not the third.I 3/2 = 1. Integer division. from __future__ import division.I Indentation matters!I Can import specific functions from a module.I Some variable assignments are by reference, others are by copy.I a = b = 1 assigns 1 to both a and b.

MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 5 / 21

Page 6: TEXT MINING INTRO TO PYTHON - IDA - Link¶ping University

PYTHON’S OBJECTS

I Built-in types: numbers, strings, lists, dictionaries, tuples and files.I Vectors, arrays and matrices are available in the numpy/scipy

modules.I Python is a strongly typed language. ’mattias’ + 3 gives an error.I Python is a dynamically typed language. No need to declare a

variables type before it is used. Python figures out the object’s type.

MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 6 / 21

Page 7: TEXT MINING INTRO TO PYTHON - IDA - Link¶ping University

STRINGS

I s = ’Spam’I s[0] returns first letter, s[-2] return next to last letter. s[0:2]

returns first two letter.I len(s) returns the number of letters.I s.lower(), s.upper(), s.count(’m’), s.endswith(’am’),

...I Which methods are available for my object? Try in Spyder: type

s. followed by TAB.I + operator concatenates strings.I (behind the scenes: the string object has an __add__ method:

s.__add__(anotherString))I sentence = ’Guido is the benevolent dictator for life’. sentence.split()I s*3 returns ’SpamSpamSpam’

MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 7 / 21

Page 8: TEXT MINING INTRO TO PYTHON - IDA - Link¶ping University

THE LIST OBJECT

I A list is a container of several variables, possibly of different types.I myList = [’spam’,’spam’,’bacon’,2]I The list object has several associated methods

I myList.append(’egg’)I myList.count(’spam’)I myList.sort()

I + operator concatenates lists: myList + myOtherList merges thetwo lists as one list.

MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 8 / 21

Page 9: TEXT MINING INTRO TO PYTHON - IDA - Link¶ping University

THE LIST OBJECT

I Extract elements from a list: myList[1]I Lists inside lists:

I myOtherList = [’monty’,’Python’]I myList[1] = myOtherListI myList[1] returns the list [’monty’,’Python’]I myList[1][1] returns the string ’Python’

MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 9 / 21

Page 10: TEXT MINING INTRO TO PYTHON - IDA - Link¶ping University

PYTHON’S OBJECTS: VECTORS AND ARRAYS (AND

MATRICES)I from scipy import *I x = array([1,7,3])I 2-dimensional array (matrix): X = array([[2,3],[4,5]])I Indexing matrices

I First row: X[0,]I Second column: X[,1]I Element in position 1,2: X[0,1]

I Array multiplication (*) is element-wise.I There is also a matrix object: X = array([[2,3],[4,5]])I For matrix objects multplication (*) is matrix multiplication.I Arrays are recommended (not matrices).I Submodule scipy.linalg contains a lot of matrix-functions (det(),

inv(), eig() etc). I recommend: from scipy.linalg import *

MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 10 / 21

Page 11: TEXT MINING INTRO TO PYTHON - IDA - Link¶ping University

PYTHON’S OBJECTS: DICTIONARIES

I Unordered collection of objects (elements).I myDict = {’Leif’:23, ’Dag’:17, ’Lyam’:12}I Elements are accessed by keyword not by index (offset):

myDict[’Dag’] returns 17.I Can contain any object: myDict = {’Leif’:[23,14],

’Dag’:17, ’Lyam’:[12,29]}. myDict[’Leif’][1] returns 14.I Numbers can also be used as keys: myDict = {2:’contents of

box2’, 4:’content of box 4’, ’blackbox’:10}I myDict.keys()I myDict.values()I myDict.items()

MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 11 / 21

Page 12: TEXT MINING INTRO TO PYTHON - IDA - Link¶ping University

PYTHON’S OBJECTS: TUPLES

I myTuple = (3,4,’mattias’)I Like lists, but immutable (cannot change elements are creation)I Why?

I Faster than listsI Protected from changeI Can be used as keys in dictionariesI Multiple return object from functionI Swapping variable content (a, b) = (b, a) [a,b = b,a also works]I String formatting: name = "Mattias"; age = 39; "My name is%s and I am %d years old" % (name , age)

I Sequence unpacking a , b, c = myTuple

I list(myTuple) returns myTuple as a list. tuple(myList) does theopposite.

MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 12 / 21

Page 13: TEXT MINING INTRO TO PYTHON - IDA - Link¶ping University

PYTHON’S OBJECTS: SETS

I Set. Contains objects in no order with no identification.I With a sequence, elements are ordered and identified by position.myVector[2]

I With a dictionary, elements are unordered but identified by some key.myDict[’myKey’]

I With a set, elements stand for themselves. No indexing, nokey-reference.

I Declaration: fib=set( (1,1,2,3,5,8,13) ) returns the set ([1,2, 3, 5, 8, 13])

I Supported methods: len(s), x in s, set1 < set2, union,intersection, add, remove, pop ...

MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 13 / 21

Page 14: TEXT MINING INTRO TO PYTHON - IDA - Link¶ping University

BOOLEAN OPERATORS

I True/FalseI andI orI notI a = True; b = False; a and b [returns False].

MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 14 / 21

Page 15: TEXT MINING INTRO TO PYTHON - IDA - Link¶ping University

IF-ELSE CONSTRUCTS

IF-ELSE STATEMENTa =1if a==1:

print(’a is one’)elif a==2:

print(’a is one’)else:

print(’a is not one or two’)}

I Switch statements via dictionaries (see Jackson’s Python book).

MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 15 / 21

Page 16: TEXT MINING INTRO TO PYTHON - IDA - Link¶ping University

WHILE LOOPS

WHILE LOOPa =10while a>1:

print(’bigger than one’)a = a - 1

else:print(’smaller than one’)

MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 16 / 21

Page 17: TEXT MINING INTRO TO PYTHON - IDA - Link¶ping University

LOOPS

I for loops can iterate over any iterable.I iterables: strings, lists, tuples

FOR LOOPword = ’mattias’for letter in word:

print(letter)

MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 17 / 21

Page 18: TEXT MINING INTRO TO PYTHON - IDA - Link¶ping University

LOOPS, CONT.

FOR LOOP 2myList = [”]*10for i in range(10):

myList = ’mattias’ + str(i)

FOR LOOP 2myList = [”]*10for i in range(10):

myList = ’mattias’ + str(i)

MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 18 / 21

Page 19: TEXT MINING INTRO TO PYTHON - IDA - Link¶ping University

LIST COMPREHENSIONS

I Set definition in mathematics

{x for x ∈ X}

where X is some a finite set.

{f (x) for x ∈ X}

I List comprehension in Python:I myList = [x for x in range(10)]I myList = [sin(x) for x in range(10)] (don’t forget from mathimport sin)

I myList = [x + y for x in linspace(0.1,1,10) for y inlinspace(10,100,10)] (from scipy import linspace)

MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 19 / 21

Page 20: TEXT MINING INTRO TO PYTHON - IDA - Link¶ping University

DEFINING FUNCTIONS AND CLASSES

DEFINING FUNCTIONSdef mySquare(x):

return x**2

I Calling the function: mySquare(x)I Classes are defined similarly using the self object.I Make you own module by putting several functions in a .py file. Then

import what you need.

MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 20 / 21

Page 21: TEXT MINING INTRO TO PYTHON - IDA - Link¶ping University

MISC

I Comments one individual lines starts with #I Comments spanning over multiple lines """This is a looooong

comment” ” ”

MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 21 / 21