35
Outline General Introduction Basic Types in Python Programming Exercises Python course in Bioinformatics Xiaohui Xie March 31, 2009 Xiaohui Xie Python course in Bioinformatics

Python course in Bioinformaticsxhx/courses/CS174/lectures/python_tutorial.pdf · Outline General Introduction Basic Types in Python Programming Exercises Lists I A list of comma-separated

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Python course in Bioinformatics

    Xiaohui Xie

    March 31, 2009

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    General Introduction

    Basic Types in Python

    Programming

    Exercises

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Why Python?

    I Scripting language, raplid applications

    I Minimalistic syntax

    I Powerful

    I Flexiablel data structure

    I Widely used in Bioinformatics, and many other domains

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Where to get Python and learn more?

    I Main source of information: http://docs.python.org/

    I Tutorial: http://docs.python.org/tutorial/index.html

    I Biopython: http://biopython.org/wiki/Main Page

    Xiaohui Xie Python course in Bioinformatics

    http://docs.python.org/http://docs.python.org/tutorial/index.htmlhttp://biopython.org/wiki/Main_Page

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Invoking Python

    I To start: type python in command lineI It will look like

    Python 2.5.2 (r252:60911, Mar 25 2009, 00:12:33)

    [GCC 4.1.2 (Gentoo 4.1.2 p1.0.2)] on linux2

    Type "help", "copyright", "credits" or "license" for more information.

    >>>

    I You can now type commands in the line denoted by >>>

    I To leave: type end-of-file character ctrl-D on Unix, ctrl-zon Windows

    I This is called interactive mode

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Appetizer Example

    I Task: Print all numbers in a given fileI File: numbers.txt

    2.1

    3.2

    4.3

    I Code: print.py

    # Note: the code lines begin in the first column of the file. In

    # Python code indentation *is* syntactically relevant. Thus, the

    # hash # (which is a comment symbol, everything past a hash is

    # ignored on current line) marks the first column of the code

    data = open("numbers.txt", "r")

    for d in data:

    print d

    data.close()

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Appetizer Example cont’d

    I Task: Print the sum of all the data in the fileI Code: sum.py

    data = open("numbers.txt", "r")

    s = 0

    for d in data:

    s = s + float(d)

    print s

    data.close()

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Interative Mode

    I prompt >>> allows to enter command

    I command is ended by newline

    I variables need not be initialized or declared

    I a colon “:” opens a block

    I ... prompt denotes that block is expected

    I no prompt means python output

    I a block is indented

    I by ending indentation, block is ended

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Differences to Java or C

    I can be used interatively. This makes it much easier to testprograms and to debug

    I no declaration of variables

    I no brackets denote block, just indentation (Emacs supportsthe style)

    I a comment begins with a “#”. Everything after that isignored.

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Numbers

    I Example

    >>> 2+2

    4

    >>> # This is a comment

    ... 2+2

    4

    >>> 2+2 # and a comment on the same line as code

    4

    >>> (50-5*6)/4

    5

    >>> # Integer division returns the floor:

    ... 7/3

    2

    >>> 7/-3

    -3

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Numbers cont’d

    I Example

    >>> width = 20

    >>> height = 5*9

    >>> width * height

    900

    >>> # Variables must be defined (assigned a value) before they can be

    >>> # used, or an error will occur:

    >>> # try to access an undefined variable

    ... n

    Traceback (most recent call last):

    File "", line 1, in

    NameError: name ’n’ is not defined

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Strings

    I Strings can be enclosed in single quotes or double quotesI Example

    >>> ’spam eggs’

    ’spam eggs’

    >>> ’doesn\’t’

    "doesn’t"

    >>> "doesn’t"

    "doesn’t"

    >>> ’"Yes," he said.’

    ’"Yes," he said.’

    >>> "\"Yes,\" he said."

    ’"Yes," he said.’

    >>> ’"Isn\’t," she said.’

    ’"Isn\’t," she said.’

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Strings cont’d

    I Strings can be surrounded in a pair of matching triple-quotes:""" or ’’’. End of lines do not need to be escaped whenusing triple-quotes, but they will be included in the string.

    I Example

    print """

    Usage: thingy [OPTIONS]

    -h Display this usage message

    -H hostname Hostname to connect to

    """

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Strings cont’d

    I Strings can be concatenated (glued together) with the +operator, and repeated with *:

    I Example

    >>> word = ’Help’ + ’A’

    >>> word

    ’HelpA’

    >>> ’’

    ’’

    >>> ’str’ ’ing’ # >> ’str’.strip() + ’ing’ #

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Strings cont’d

    I Strings can be subscripted (indexed); like in C, the firstcharacter of a string has subscript (index) 0.

    I There is no separate character type; a character is simply astring of size one.

    I Substrings can be specified with the slice notation: twoindices separated by a colon.

    I Example

    >>> word = ’Help’ + ’A’

    >>> word[4]

    ’A’

    >>> word[0:2]

    ’He’

    >>> word[:2] # The first two characters

    ’He’

    >>> word[2:] # Everything except the first two characters

    ’lpA’

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Strings cont’d

    I Unlike a C string, Python strings cannot be changed.Assigning to an indexed position in the string results in anerror:

    I Example

    >>> word[0] = ’x’

    Traceback (most recent call last):

    File "", line 1, in ?

    TypeError: object doesn’t support item assignment

    >>> ’x’ + word[1:]

    ’xelpA’

    >>> ’Splat’ + word[4]

    ’SplatA’

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Strings cont’d

    I Example

    >>> from string import *

    >>> dna = ’gcatgacgttattacgactctg’

    >>> len(dna)

    22

    >>> ’n’ in dna

    False

    >>> count(dna,’a’)

    5

    >>> replace(dna, ’a’, ’A’)

    ’gcAtgAcgttAttAcgActctg’

    I Exercise: Calculate GC percent of dna

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Strings cont’d

    I Solution: Calculate GC percent

    >>> gc = (count(dna, ’c’) + count(dna, ’g’)) / float(len(dna)) * 100

    >>> "%.2f" % gc

    ’64.08’

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Strings cont’d

    I Exercise: Calculate the complement of DNA

    A - T

    C - G

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Lists

    I A list of comma-separated values (items) between squarebrackets.

    I List items need not all have the same type (compund datatypes)

    >>> a = [’spam’, ’eggs’, 100, 1234]

    >> a[0]

    ’spam’

    >>> a[3]

    1234

    >>> a[-2]

    100

    >>> a[1:-1]

    [’eggs’, 100]

    >>> a[:2] + [’bacon’, 2*2]

    [’spam’, ’eggs’, ’bacon’, 4]

    >>> 3*a[:3] + [’Boo!’]

    [’spam’, ’eggs’, 100, ’spam’, ’eggs’, 100, ’spam’, ’eggs’, 100, ’Boo!’]

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Lists cont’d

    I Unlike strings, which are immutable, it is possible to changeindividual elements of a list

    I Assignment to slices is also possible, and this can even changethe size of the list or clear it entirely

    I Example

    >>> a

    [’spam’, ’eggs’, 100, 1234]

    >>> a[2] = a[2] + 23

    >>> a[0:2] = [1, 12] # Replace some items:

    >>> a[0:2] = [] # Remove some:

    >>> a

    [123, 1234]

    >>> a[1:1] = [’bletch’, ’xyzzy’] # Insert some:

    >>> a

    [123, ’bletch’, ’xyzzy’, 1234]

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Lists cont’d

    I Functions returning a list

    >>> range(3)

    [0, 1, 2]

    >>> range(10,20,2)

    [10, 12, 14, 16, 18]

    >>> range(5,2,-1)

    [5, 4, 3]

    >>> aas = "ALA TYR TRP SER GLY".split()

    >>> aas

    [’ALA’, ’TYR’, ’TRP’, ’SER’, ’GLY’]

    >>> " ".join(aas)

    ’ALA TYR TRP SER GLY’

    >>> l = list(’atgatgcgcccacgtacga’)

    [’a’, ’t’, ’g’, ’a’, ’t’, ’g’, ’c’, ’g’, ’c’, ’c’, ’c’, ’a’,

    ’c’, ’g’, ’t’, ’a’, ’c’, ’g’, ’a’]

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Dictionaries

    I A dictionary is an unordered set of key: value pairs, with therequirement that the keys are unique

    I A pair of braces creates an empty dictionary: .I Placing a comma-separated list of key:value pairs within the

    braces adds initial key:value pairs to the dictionaryI The main operations on a dictionary are storing a value with

    some key and extracting the value given the keyI Example

    >>> tel = {’jack’: 4098, ’sape’: 4139}

    >>> tel[’guido’] = 4127

    >>> tel

    {’sape’: 4139, ’guido’: 4127, ’jack’: 4098}

    >>> tel[’jack’]

    4098

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Dictionaries cont’d

    I Example

    >>> tel = {’jack’: 4098, ’sape’: 4139, ’guido’ = 4127}

    >>> del tel[’sape’]

    >>> tel[’irv’] = 4127

    >>> tel

    {’guido’: 4127, ’irv’: 4127, ’jack’: 4098}

    >>> tel.keys()

    [’guido’, ’irv’, ’jack’]

    >>> ’guido’ in tel

    True

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Programming

    I Example

    a, b = 3, 4

    if a > b:

    print a + b

    else:

    print a - b

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Programming cont’d

    I Example

    >>> # Fibonacci series:

    ... # the sum of two elements defines the next

    ... a, b = 0, 1

    >>> while b < 10:

    ... print b

    ... a, b = b, a+b

    ...

    1

    1

    2

    3

    5

    8

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Programming features

    I multiple assignment: rhs evaluated before anything on theleft, and (in rhs) from left to right

    I while loop executes as long as condition is True (non-zero,not the empty string, not None)

    I block indentation must be the same for each line of block

    I need empty line in interactive mode to indicate end of block(not required in edited code)

    I use of print

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Printing

    I Example

    >>> i = 256*256

    >>> print ’The value of i is’, i

    The value of i is 65536

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Flow control

    I Example

    x = 35

    if x < 0:

    x = 0

    print ’Negative changed to zero’

    elif x == 0:

    print ’Zero’

    elif x == 1:

    print ’Single’

    else:

    print ’More’

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Iteration

    I Python for iterates over sequence (string, list, generatedsequence)

    I Example

    a = [’cat’, ’window’, ’defenestrate’]

    for x in a:

    print x, len(x)

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Iteration

    I Python for iterates over sequence (string, list, generatedsequence)

    I Example

    a = [’cat’, ’window’, ’defenestrate’]

    for x in a:

    print x, len(x)

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Definiting functions

    I Example

    def fib(n): # write Fibonacci series up to n

    """Print a Fibonacci series up to n."""

    a, b = 0, 1

    while b < n:

    print b,

    a, b = b, a+b

    # Now call the function we just defined:

    fib(2000)

    # will return:

    # 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Reverse Complement of DNA

    I Excercise: Find the reverse complement of a DNA sequenceI Example

    5’ - ACCGGTTAATT 3’ : forward strand

    3’ - TGGCCAATTAA 5’ : reverse strand

    So the reverse complement of ACCGGTTAATT is AATTAACCGGA

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Reverse Complement of DNA

    I Solution: Find the reverse complement of a DNA sequence

    from string import *

    def revcomp(dna):

    """ reverse complement of a DNA sequence """

    comp = dna.translate(maketrans("AGCTagct", "TCGAtcga"))

    lcomp = list(comp)

    lcomp.reverse()

    return join(lcomp, "")

    Xiaohui Xie Python course in Bioinformatics

  • OutlineGeneral Introduction

    Basic Types in PythonProgramming

    Exercises

    Translate a DNA sequence

    I Excercise: Translate a DNA sequence to an amino acidsequence

    I Genetic code

    standard = { ’ttt’: ’F’, ’tct’: ’S’, ’tat’: ’Y’, ’tgt’: ’C’,

    ’ttc’: ’F’, ’tcc’: ’S’, ’tac’: ’Y’, ’tgc’: ’C’,

    ’tta’: ’L’, ’tca’: ’S’, ’taa’: ’*’ , ’tca’: ’*’,

    ’ttg’: ’L’, ’tcg’: ’S’, ’tag’: ’*’, ’tcg’: ’W’,

    ’ctt’: ’L’, ’cct’: ’P’, ’cat’: ’H’, ’cgt’: ’R’,

    ’ctc’: ’L’, ’ccc’: ’P’, ’cac’: ’H’, ’cgc’: ’R’,

    ’cta’: ’L’, ’cca’: ’P’, ’caa’: ’Q’, ’cga’: ’R’,

    ’ctg’: ’L’, ’ccg’: ’P’, ’cag’: ’Q’, ’cgg’: ’R’,

    ’att’: ’I’, ’act’: ’T’, ’aat’: ’N’, ’agt’: ’S’,

    ’atc’: ’I’, ’acc’: ’T’, ’aac’: ’N’, ’agc’: ’S’,

    ’ata’: ’I’, ’aca’: ’T’, ’aaa’: ’K’, ’aga’: ’R’,

    ’atg’: ’M’, ’acg’: ’T’, ’aag’: ’K’, ’agg’: ’R’,

    ’gtt’: ’V’, ’gct’: ’A’, ’gat’: ’D’, ’ggt’: ’G’,

    ’gtc’: ’V’, ’gcc’: ’A’, ’gac’: ’D’, ’ggc’: ’G’,

    ’gta’: ’V’, ’gca’: ’A’, ’gaa’: ’E’, ’gga’: ’G’,

    ’gtg’: ’V’, ’gcg’: ’A’, ’gag’: ’E’, ’ggg’: ’G’ }

    Xiaohui Xie Python course in Bioinformatics

    OutlineGeneral IntroductionBasic Types in PythonProgrammingExercises