48
Intro to: Computers & Programming: File Input & Output in Python CSCI-UA.0002 Adam Meyers New York University Introduction to: Computers & Programming: File Input and Output (IO)

Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

  • Upload
    others

  • View
    19

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

Adam Meyers

New York University

Introduction to: Computers & Programming: File Input and Output (IO)

Page 2: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

Summary• What is Input and Ouput?

• What kinds of Input and Output have we covered so far?– print (to the console)– input (from the keyboard)

• File handling – input from files– output to files– Text files vs. 'pickled' binary files

• URL handling

Page 3: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

Input• Input is any information provided to the program

– Keyboard input– Mouse input– File input– Sensor input (microphone, camera, photo cell, etc.)

• Output is any information (or effect) that a program produces:– sounds, lights, pictures, text, motion, etc.– on a screen, in a file, on a disk or tape, etc.

Page 4: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

Types of Input Covered in This Class

• So Far– Input: keyboard input only– Output: graphical and text output transmitted to the

computer screen

• This Unit expands our repertoire to include:– File Input – Python can read in the contents of files– File Output – Python can write text to files

Page 5: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

Files• File = Named Data Collection stored on memory device

– Different types of data: text, binary, etc– Accessible by name or address– Has start and end point– Program can read, created, modified, (and do other things to)

files

• Text file can be treated like a (big) string– Human readable– ASCII/UTF-8/etc. encoding– Can be plain text or can contain markup (e.g., html)

• Binary files: not human readable, usually require specific programs to read

Page 6: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

Use Text Editors for Text Files

• A text file (.txt) can be created or edited with a text editor

• Text Editors– Apple: TextEdit– Windows: Wordpad (preferred) and Notepad– Unix Systems: emacs (available for most systems), vi

or ex

Page 7: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

Folders/Directories and Paths

• A Folder or a Directory is a named stored item that contains other folders and/or Files

• The root directory of a storage device:• no other directory contains it• it contains all other directories/files on that storage device.

• A sequence from directory to directory to directory … ending in a directory or file is called a path.

– Each item n in the path (except the root) is contained by the n-1 item.

– There is at least one path from the root to every file & directory, i.e., paths can be used to identify/locate files

– Each path uniquely identifies a single directory or file (ignoring short cuts, aka, symbolic links)

Page 8: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

Slash Notation for Representing Paths• Unix operating systems (linux, Apple, Android,

etc.) use the forward slash to connect directories in a path, e.g., the path for this file could be:

/Users/Adam/Desktop/Class Talks/Input-Output.odp

• MSDOS, Windows and related systems use the backslash instead \

• The root directory in UNIX systems is labeled /• In Windows it is a letter and a colon, e.g., C:

Page 9: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

The Directory Tree Including this File

Page 10: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

Filename Conventions• There are some variations about legal characters for filenames

between different systems

• Conservative assumptions:

– Use letters, numbers and underscore• Dashes are OK, but can be problematic for Python

– Use conventional file extensions = filename endings beginning with a period: .py, jpg, doc, etc.

– Some file extensions we are likely to use with python:• Python = .py• Text = .txt• Comma separated values = .csv• Tab separated values = .tsv

Page 11: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

Path Conventions• The full pathname of a file is the path

from the root to that file• A relative pathname of a file is a path

from some other point to that file• Commonly, paths are described relative

to some working directory, commonly called the “current working directory” or the “present working directory”

Page 12: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

The os module • Interface between python and Operating System

• http://docs.python.org/py3k/library/os.html

• For performing OS-dependent operations– handling files, checking for system features, getting root or

administrative permission, etc.

• As with other modules, needs to be imported– import os, help(os), etc.

• Other system info is in platform module– e.g., platform.system() distinguishes Windows, Apple

(Darwin), linux, etc.

– Most of this info would not be needed for the types of programs we can reasonably expect to write this term

Page 13: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

Global Variables from the os Module• os.name

– 'posix' (linux, current apple) or 'nt' (most Windows)– Others: 'os2', 'ce' ,'java','mac'(old Apple, I think)

• os.environ – all environmental variables (as a dictionary)

• os.linesep – '\n' for most systems, '\r\n' for Windows

• os.sep – '/' for most systems, '\\' for Windows

Page 14: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

Listing, Renaming & Removing paths and Creating Directories

• os.getcwd() – gets current working directory

• os.listdir(directory) – gets children of directory

• os.chdir(path) – change current working directory to path

• os.mkdir(path) – make directory called path

• os.remove(path) – remove file (not directory) called path

• os.rmdir(path) – remove directory (not file) called path

• os.rename(oldpath,newpath) – rename (or move) oldpath to newpath

• os.path.isfile(path), os.path.isdir(path) – Boolean functions indicating if a particular pathname refers to a real file/directory

• os.system(command) – execute a terminal command as indicated by the string command

Page 15: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

File Permissions• There are certain directories and files that

require root or administrative permission to open or read.

• It is possible to view and change these permission properties.

• For simplicity, we are only going to deal with files which our user has permission to create, remove and/or change

Page 16: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

Files and Streams• A stream is a continuous block of data ending in EOF (end of file

character) – Python's “file objects” are instances of streams.– A computer program can read a stream

– A computer can write to a stream

– Other similar operations (e.g., append) are also possible

• An input stream can be created (opened) containing data found in a file. A program can then read data from this stream.

• A program can create (open) an output stream and add (write) data to it. When the stream is closed, the data in the stream is written to the file. This writing can either overwrite an existing file or create a new one.

• Other streams exist including Input/Output in a command terminal

– standard input (the words you type)

– standard output (what you see when words appear on the screen);

– Others.

Page 17: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

Reading and Writing to Files in Python• instream = open(path, 'r')

– creates an input stream containing the contents of the file named path and makes it the value of the variable instream.

• outstream = open(path,'w')– creates an output stream for writing data and set the variable

outstream to this stream. Path names the file that will be created when this stream is closed (a previous file with that name would be overwritten).

• When the program is finished with a stream, it should close it as follows:– stream.close()– If stream is an output stream, a file is created or overwritten

Page 18: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

Options for open(stream)

• Direction– r – read (previous slide)– w – write (previous slide)– a – append (add to the end)– + – open for read and write

• File Type– b – binary mode– t – text mode (default)

Page 19: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

Sample function that reads and prints a text file

• def read_story(file):story = open(file,'r')

for line in story:print(line,end='')

story.close()

– IO-examples.py

– read_story('/Users/adam/Documents/short_story.txt')

– read_story('short_story.txt')

• The file 'short_story'.txt – is in the current working directory– os.getcwd() → '/Users/meyers/Desktop/Python-Class/Python-programs/'

– os.listdir(os.getcwd()) → a big list of files

– 'short_story.txt' in os.listdir(os.getcwd()) → True

Page 20: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

More about reading file• The for loop treats the input stream (story) as a sequence of lines, each

line being a string.

for line in story:print(line,end='')

• The print function does not require a newline after each string

– Each line is a string that ends with os.linesep ('/n' for all posix systems (Apple, Linux) and with '/r/n' in Windows)

– Leaving out end='' results in additional blank line being printed

• At the beginning of the read_story function, we open a stream which we call story as follows:

story = open(filename, 'r'))

• At the end of the function we close the stream as follows

story.close()

Page 21: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

Reading from Streams—Details • for loop – treats stream as a list of lines

• Method readline reads one line at a time, moving to the next position in the stream

• def read_story2(file):story = open(file,'r')line = '*start*'while line != '':

line = story.readline.()print(line,end='')

story.close()## equivalent to read_story

Page 22: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

Reading from Streams—More Details

• stream.read() method – reads the whole stream as one big string

• def read_story3(file):

story = open(file,'r')

big_string = story.read()

line_list = big_string.split(os.linesep)

for line in line_list:print(line,end='')

story.close()

## equivalent to read_story

Page 23: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

Reading from Streams—More Details• stream.read(1) method – reads one character at a time• def read_story4(file):

story = open(file,'r')

char = '***'

while char != '':char = story.read(1)print(char,end='')

story.close()

## equivalent to read_story

Page 24: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

Alternatives for Reading from Stream• input_string = stream.read()

– Creates one large string consisting of all characters in a file.

– Flexible – the program can divide this up in any way, really easily, e.g., for some texts, splitting at tabs will make a list of paragraphs.

• next_line = stream.readline()– Get string starting ing at current stream position and ending with os.linesep.

Advance position to just after this os.linesep.

• next_char = stream.read(1)

– Reads character at current stream position and advances stream position

– next_string = stream.read(N) – read N characters

• for line in stream: ## loop through stream and treat as list of lines

• Other stream methods listed under class IOBase in https://docs.python.org/3.1/library/io.html

– Testing for types of streams, changing stream position, returning portions of stream, etc.

Page 25: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

Reading txt files on your machine• Put a plain text file in the current working directory and/or use an absolute path name

• I am basing the description so far on Posix (linux, Apple, Solaris, BS/D, etc.) paths

• Windows paths are different

• My windows cwd is 'C:\\Python33' by default

• There are backslash instead of slashes

• By default the file system does not display the file type (.txt)– So we have to be extra careful that we have the right filename

• In Python string, backslashes are indicated by using 2 backslashes

– For Windows, it may be convenient to use the notation for a 'raw' string:

r'Z:\2015-class-websites\Python-programs\short_story.txt'

• Thus, on my Windows machine, the following commands work:

– read_story('Z:\\2015-class-websites\\Python-programs\\short_story.txt')

– read_story(r'Z:\2015-class-websites\Python-programs\short_story.txt')

• If I change directories first, I can just use the relative path (just the filename)

– os.chdir(r'Z:\2015-class-websites\Python-programs\')

– read_story('short_story.txt')

Page 26: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

Function that Writes User Input to a File• def take_dictation(outfile):

dictation = open(outfile,'w')

line = 'Empty'

while (not (line == '')):line = input('Please give next line or hit enter if you are done. ')dictation.write(line+os.linesep)

dictation.close()

• take_dictation('ClassNotesTues11-16.txt')

• In IO-examples.py

Page 27: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

Notes about Dictation Function• Since I did not provide an absolute path, I know that the file will

be in the current working directory.

• os.getcwd() → identifies the cwd

• The output file ('Dictation_011_Nov-16.txt') is located there.

• The function initializes an output stream using the 'w' (write) option of open

• The variable line is initialized as 'Empty'

• Then a while loop keeps going as long as line is not equal to the empty string.

• This kind of while loop is called a sentinel loop because we use a sentinel string (the empty string) to indicate when it is done.

Page 28: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

More on the Dictation Function• Other sentinel strings are possible.

– The empty string is not ideal as it could be entered by accident.– Perhaps, an explicit **stop** would make sure the user only

stops when they mean to do so.

• The while loop prints each user input on a newline using the function (method) called write which is specific to streams.

• os.linesep is added to the end of the string. – We use the global variable os.linesep so the same program can

be run on any platform (Apple, Windows, Linux, …)

• After the loop, we close the stream (and it writes to the file)

Page 29: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

Comparing stream.write & print

• The stream.write method takes one string as an argument

• It is sort of like a print function where a file is printed to instead of the stream

• Differences: – print takes more than one argument

• Print out is moderated by using :sep and :end keywords

– print can automatically converts most objects to strings

Page 30: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

A Shortcut for Opening and Closing Streams• Block of code in which the stream is opened

• with open(filename,'r') as instream:

for line in instream:

print(line,end='')

• with open(filename,'w') as outstream:

for line in list_of_output_lines:

outstream.write(line)

• with open(infile,'r') as instream, open(outfile,'w') as outstream:

for line in instream:

outstream.write(line)

• Equivalent to:

instream = open(infile,'r')

outstream = open(outfile,'w')

for line in instream:

outstream.write(line)

instream.close()

outstream.close()

Page 31: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

A Simple Spam Filter

• There are a bunch of email messages stored as files in a directory

• One at a time, the program reads these files and checks to see which ones pass a spam test.

• If the test says a file is spam, the program moves it into the spam directory, otherwise the program moves it into the to-read directory.

• In IO-examples.py

Page 32: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

The implemented version of filter_spam• Function call: filter_spam('letters','spam','to-read')

– Sorts through the files in 'letters' and distributes them to 'spam' and 'to-read'.

– Function call assumes that all directories are subdirectories of cwd

• filter_spam uses objects from the os package• os.path.isdir() – checks to see if the output directories exist

• os.path.mkdir() – makes the directories if they don't exist

• os.rename(file,destination) – 2 equivalent interpretations– moves a file from one path to another path

– renames a file from one pathname to another

• os.sep –global variable – '/' for UNIX and '\\' for Windows

• The function is_spam determines which directory a file is moved to

Page 33: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

The function: is_spam• A function that returns True or False

• The current version returns True if:– The file is too big (more than 25K bytes)

• Is uses the .st_size slot of the object os.stat(file)– Or the subject line has no lowercase letter

• The subject line begins with subject: (ignoring case)– Or the subject line includes a word from a list of spam words– Or the subject line is over 15 characters with no spaces

• There are some errors– Some of the mail classified as SPAM is really NOT SPAM– Some of the mail classfied as NOT SPAM is really SPAM

Page 34: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

A State-of-the-Art Version of is_spam Might have the Following Features

• It might pay attention to more of the letter than just the subject line (the email address, the body of the letter)

• It might look for (characteristics of) images and weird character sets

• It would probably incorporate large statistics on words that are more likely to be found in spam than in normal emails (it would not use a simple list)

• It would combine statistics, rather than basing the determination on the presence/absence of items in a list

• It would include user feedback

Page 35: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

More about Spam Program• For simplicity, we treated letters as files

– Actually the important issue is that they are streams, a more general concept that includes both files, letters, transmissions of different kinds, etc.

• This program needed to use the os package– In order to be platform independent

– The specifics of file handling largely depend on the computer, operating system, etc. that you are using

• Weird Characters were also a factor– open(file,'r', encoding='utf-8', errors='ignore')

• Encoding ensures that most characters are accepted• Errors='ignore' makes it so the program does not bomb on bad

characters (important for email which mixes text and binary)

Page 36: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

Binary Files• For our purposes, a binary file is any non-text file:

– exe, jpg, gif, mp3, etc.

• The open function can read them using mode 'br' and write to them using mode 'bw'

• Pickling is a Python process for saving python data in binary form and retrieving it– Import pickle ## loads the pickle module– pickle.dump(python_object,outstream)

## sends python_object to the output stream outstream– Instream = pickle.load(pickled_file)

• ## creates a stream instream with the contents of pickled_file

Page 37: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

Processing Webpages• import urllib.request

• Loads module for creating streams by reading in webpages

• Note that these streams will have many of the same properties as file streams

– E.g., they can be treated as lists of strings

• http://docs.python.org/py3k/library/urllib.request

• For example, I have recently used this package to process Yahoo (Bing) Searches.

– Additional work to “process” the html output of search:• separate out the top 10 search results• Identify URL link, title and abstract

Page 38: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

Reading Data from Files• Files can store structured information that can be output or input by

programs

• Ex 1: phone_list.txt– Each Record = consecutive lines with no blanks

– Each line contains a feature and a value (split at ':')

• Ex 2 and 3: comma or tab seprated files (.csv & .tsv)– Each Line represents 1 record

– Each line = values separated by commas or tabs

– Position (or column) determines feature

– Column labels can be used as a 1st line

– Can be read by standard spreadsheet programs

• LibreOffice Calc, Google Docs, Microsoft Excel, etc.

Page 39: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

IO-examples.py programs for reading/writing out records

• Programs for same records in different formats– read_in_phone_records('address_list.txt')

– read_in_phone_database_file(inputfile)

– ## 'phone_list.tsv' or 'phone_list.csv'

• Program for adding manually entered records to a file in the .csv or .tsv format– add_phone_record_from_user_input()

• Some problems with this program:– Difficult to change entries & prevent duplicate/conflicting entries

– Dictionaries better in this way – data structure covered next week

Page 40: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

Summary• Python can use files for input and output (I/O)

– This class will only deal with text file I/O

• It is possible to read in unstructured text (e.g., a story) and I is possible to output unstructured text

• It is also possible to use text files to store different sorts of structured input, .csv and .tsv files are standard examples

• We covered the use of Python stream objects including reading from and writing to streams

• We also discussed the os package including variables and functions that deal with files as expected by different operating systems.

Page 41: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

Homework Part 1 – Due Dec 2

• Read Chapter 9• Module 10, Quiz 10

Page 42: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

Homework – Part 2 Due Dec 2, 2015Question 1 (repeated)

• Write a function that solicits a yes or no answer from a user using the function Input.

• If the user inputs: 'Yes,' 'yes', 'Y' or 'y', the function should return True

• If the user inputs: 'No','no', 'N', or 'n', the function should return False

• Otherwise, it should raise an exception. The error message can be anything that makes sense, e.g., 'Yes-No Error: only yes or no answers are permitted'

Page 43: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

Question 2 (repeated)• Write a function that prints out the ratio of

registered democrat to republican voters for a particular voting district.

• For districts with all republican or all democtat voters, the function should print out “All Republicans” or “All Democrats” instead of a ratio

• Use simple (float) division to calculate the ratio, but use try, except and the ZeroDivisionError to catch the “All Democrat” cases.

Page 44: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

Homework 8Part 3 – Due Nov 30

Question 1• Write a set of functions that record and average user responses to the Oscar

Nominated movies from 2015.

• Each record includes: – user number (different for each user, starting at 1)

– An opinion from 0 to 5 stars for each of the following 8 movies• Birdman, Whiplash, The Imitation Game, The Theory of Everything, Boyhood, The

Grand Budapest Hotel, American Sniper, and Selma

– Opinions numbers have the following interpretation• 0 : I did not see the movie, 1: Terrible, 2: Bad, 3: Average, 4: Good, 5: Fantastic

• Some example entries as a list of lists:

– user_ratings = [[1,0,0,4,0,0,5,0,0],[2,2,1,0,5,3,5,1,4],[3,5,0,2,2,2,4,3,3]]

Page 45: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

Homework – Question 1 continued• Write a function add_oscar_opinions that adds data to a file in .tsv format

– similar to add_phone_record_from_user_input in IO-examples.py, except that it should solicit only user response (1 user should not provide multiple reviews)

– entries should be sorted by the user number, before being recorded in a file

– Warning: While debugging the program, use separate input and output files. Otherwise you will write over your data and have to keep recreating your test file.

• Write a function that reads in the data from a file and computes the following

– The total number of users who saw the movie (non-zeros)

– The average review of the users who saw the movie (ignore zeros)

– Test this on 10 entries or more.

Page 46: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

Homework Question 2• Modify homework 7, question 1 to work with input and output files

• The main function of the program should do the following:

– Read in an input file of team data

– Ask the user to optionally add additional teams

– Sort all the team data by the same score used in homework 7, question 1: (wins+1/2 ties)/total_number_of_games

– Write the new sorted list of teams into an output file

• The input/output files should be .tsv files (tsv = tab separated values) – The fields are: team_name, wins, losses, ties

– Example: Mets10 5 5

• Use same method of sorting by scores as in HW 7 and shown in class– http://cs.nyu.edu/courses/fall15/CSCI-UA.0002-007/homework_sort.py

Page 47: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

Question 3• Write a program that does the following:

1. reads each line from a txt file and convert it to lowercase2. counts the number of instances of: the characters 'a', 'e','i','o' and 'u' in the file3. create a new file of file type .vowel_profile4. Print out lines in the file indicating the frequencies of each of these vowels

• Example input/output files:– http://cs.nyu.edu/courses/fall15/CSCI-UA.0002-007/paragraph_from_wikipedia.txt

– http://cs.nyu.edu/courses/fall15/CSCI-UA.0002-007/paragraph_from_wikipedia.vowel_profile

Page 48: Introduction to: Computers & Programming: File Input and ... · File Input & Output in Python CSCI-UA.0002 Summary •What is Input and Ouput? •What kinds of Input and Output have

Intro to: Computers & Programming: File Input & Output in Python

CSCI-UA.0002

Grading Criteria

1.Does it work?2.Does it do what is asked for?3.Is the code easy to understand?4.Is the code elegant?5.Will your code work equally

well on all OS systems – did you use os.linesep ?