14
Assignment 6: Motif Finding Bio5488 2/24/17 Slide Credits: Nicole Rockweiler

Assignment 6: Motif Findinggenetics.wustl.edu/bio5488/files/2017/02/Assignment-6-Slides-.pdf · Assignment 6: requirements • Due in 1 week (3/3/17) at 10 AM • Your submission

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Assignment 6: Motif Findinggenetics.wustl.edu/bio5488/files/2017/02/Assignment-6-Slides-.pdf · Assignment 6: requirements • Due in 1 week (3/3/17) at 10 AM • Your submission

Assignment6:MotifFindingBio54882/24/17

SlideCredits:NicoleRockweiler

Page 2: Assignment 6: Motif Findinggenetics.wustl.edu/bio5488/files/2017/02/Assignment-6-Slides-.pdf · Assignment 6: requirements • Due in 1 week (3/3/17) at 10 AM • Your submission

Assignment6:Motiffinding• Input• Promotersequences• PWMsofDNA-bindingproteins

• Goal• FindputativebindingsitesinthesequencesbyscanningthesequencesformatchestothePWM

• Output• Listofthelocationsandscoresofputativebindingsites

PWM Putativebindingsequence

Promoter

Page 3: Assignment 6: Motif Findinggenetics.wustl.edu/bio5488/files/2017/02/Assignment-6-Slides-.pdf · Assignment 6: requirements • Due in 1 week (3/3/17) at 10 AM • Your submission

Inputfiles• Promotersequences• Justthesequence,i.e.,notafasta

• PWMsofDNA-bindingproteins• Whitespace-delimited• aij =scoreforbasei atpositionj

• RowscorrespondtoA,C,G,&T• Columnscorrespondtopositions• The higher the score, the better the score

ExamplePWM

-5-945-326-510-1010-10-14310-460-110-31

ExamplePWMfile

Page 4: Assignment 6: Motif Findinggenetics.wustl.edu/bio5488/files/2017/02/Assignment-6-Slides-.pdf · Assignment 6: requirements • Due in 1 week (3/3/17) at 10 AM • Your submission

AssignmentTODOs

• DeterminethehighestaffinitybindingsiteforeachPWM• CalculatebyhandorwriteascriptJ

• Commentthestarterscriptscan_sequence.py• Commenttheexistingcodeblocks• Commenttheuser-definedfunctionswithfunctiondocstrings

Page 5: Assignment 6: Motif Findinggenetics.wustl.edu/bio5488/files/2017/02/Assignment-6-Slides-.pdf · Assignment 6: requirements • Due in 1 week (3/3/17) at 10 AM • Your submission

Functiondocstrings

• Purpose:tellsthereaderhowtousethefunction• Guidelinesforwhattoinclude• Describewhatthefunctiondoes• Describetheinputargument(s)• Describetheoutputvalue(s)

• Wheretolearnmore:• PEP257: https://www.python.org/dev/peps/pep-0257/• Google’sPythonstyleguide:http://google-

styleguide.googlecode.com/svn/trunk/pyguide.html?showone=Comments#Comments

Page 6: Assignment 6: Motif Findinggenetics.wustl.edu/bio5488/files/2017/02/Assignment-6-Slides-.pdf · Assignment 6: requirements • Due in 1 week (3/3/17) at 10 AM • Your submission

Exampleofafunctiondocstring

Summaryline

Descriptionofarguments

Descriptionofreturnvalue

Page 7: Assignment 6: Motif Findinggenetics.wustl.edu/bio5488/files/2017/02/Assignment-6-Slides-.pdf · Assignment 6: requirements • Due in 1 week (3/3/17) at 10 AM • Your submission

Retrievingafunction’sdocstringCallhelp

Function’sdocstringisreturned

Docstrings arealsousedbythird-partyprogramstocreateuser-friendlydocumentationforyourproject

Page 8: Assignment 6: Motif Findinggenetics.wustl.edu/bio5488/files/2017/02/Assignment-6-Slides-.pdf · Assignment 6: requirements • Due in 1 week (3/3/17) at 10 AM • Your submission

AssignmentTODOs(cont.)

• DeterminethehighestaffinitybindingsiteforeachPWM• CalculatebyhandorwriteascriptJ

• Commenttheexistingcode• Commenttheuser-definedfunctionswithfunctiondocstrings

• Modifythescripttoscansthereversecomplementoftheinputsequence• Modifythescripttoreportonlyreporthitsthathavescoresaboveagiventhreshold• Scanpromoters(n=2)tofindputativebindingsitesforeachDNA-bindingprotein(n=2)• Answerfollow-upquestions

Page 9: Assignment 6: Motif Findinggenetics.wustl.edu/bio5488/files/2017/02/Assignment-6-Slides-.pdf · Assignment 6: requirements • Due in 1 week (3/3/17) at 10 AM • Your submission

Indexing

• Indexingissomewhatarbitrary;howeverit’simportanttofollowconventions:• Thestartpositionofafeatureissmallerthanthestopposition• Thecoordinatesarerelativetotheforwardstrand

Page 10: Assignment 6: Motif Findinggenetics.wustl.edu/bio5488/files/2017/02/Assignment-6-Slides-.pdf · Assignment 6: requirements • Due in 1 week (3/3/17) at 10 AM • Your submission

Pythonlistcomprehensions

• Purpose:createlistsin1lineofcode• Therearealsodictionarycomprehensions thatworksimilarly

Codetemplate Example

Asaforloop

for <item> in <list>:<expression>

x = []for i in range(5):

x.append(i**2)

Listcompre-hension

[<expression> for <item> in <list>] x = [i**2 for i in range(5)]

Page 11: Assignment 6: Motif Findinggenetics.wustl.edu/bio5488/files/2017/02/Assignment-6-Slides-.pdf · Assignment 6: requirements • Due in 1 week (3/3/17) at 10 AM • Your submission

Pythonlistcomprehensions withfiltering

Codetemplate Example

Asaforloop

for <item> in <list>:if <conditional>:

<expression>

x = []for i in range(5):

if i % 2 == 0: # if i is evenx.append(i**2)

Listcompre-hension

[<expression> for <item> in <list>if <conditional>]

x = [i**2 for i in range(5)if i % 2 == 0]

• Wheretolearnmore:• ListcomprehensionPEP:https://www.python.org/dev/peps/pep-0202/• DictcomprehensionPEP:https://www.python.org/dev/peps/pep-0274/

Page 12: Assignment 6: Motif Findinggenetics.wustl.edu/bio5488/files/2017/02/Assignment-6-Slides-.pdf · Assignment 6: requirements • Due in 1 week (3/3/17) at 10 AM • Your submission

Python’szip function

• Purpose:“zip”togetherlists• Returnsalist*oftupleswheretheith tuplecontainstheith elementfromeachoftheinputlists

*It’sreallyaniterator,oneoflist’sclosecousins

Codetemplate Example

Asaforloop

<zipped_list> = list(zip(<list1>, <list1>, ...)) x = [0, 1, 2]y = [0, 1, 4]coords = list(zip(x,y))>>> coords[(0, 0), (1, 1), (2, 4)]

• Zippedlistscanbeunzipped(zip(*coords))• Wheretolearnmore• Python.orgdocumentation:

https://docs.python.org/3.4/library/functions.html#zip

Page 13: Assignment 6: Motif Findinggenetics.wustl.edu/bio5488/files/2017/02/Assignment-6-Slides-.pdf · Assignment 6: requirements • Due in 1 week (3/3/17) at 10 AM • Your submission

PrintingformattedstringsinPythonwithformat

• Purpose:makeyourprintstatementsprint“pretty”output,e.g.,tables• format transformsa“templatestring”bysubstitutingplaceholderswithformattedvalues• Placeholdersareenclosedin{}andspecifyhowthevalueshouldbeformatted

Notsopretty Pretty

>>> score = 1/300>>> print("The score was " + str(score))The score was 0.0033333333333333335

>>> print("The score was {s:.3f}".format(s=score))The score was 0.003>>> print("The score was {s:.3E}".format(s=score))The score was 3.333E-03

• Wheretolearnmore:• Python.orgtutorial:https://docs.python.org/3.4/tutorial/inputoutput.html#fancier-output-formatting• Python.orgdocumentation:https://docs.python.org/3.4/library/string.html#formatstrings• PythonCoursetutorial:http://www.python-course.eu/python3_formatted_output.php

Page 14: Assignment 6: Motif Findinggenetics.wustl.edu/bio5488/files/2017/02/Assignment-6-Slides-.pdf · Assignment 6: requirements • Due in 1 week (3/3/17) at 10 AM • Your submission

Assignment6:requirements

• Duein1week(3/3/17)at10AM• Yoursubmissiondirectoryshouldcontain• Amodifiedscan_sequence.py thatiswellcommentedandcontainsadocstringforeachuser-definedfunction• AREADME.txt withtheanswerstothequestionsandthecommands/workyouusedtoarriveattheanswer