21
BioPython Workshop BioPython Workshop Gershon Celniker Gershon Celniker Tel Aviv University Tel Aviv University

BioPython Workshop Gershon Celniker Tel Aviv University

Embed Size (px)

Citation preview

Page 1: BioPython Workshop Gershon Celniker Tel Aviv University

BioPython BioPython WorkshopWorkshop

Gershon CelnikerGershon Celniker

Tel Aviv University Tel Aviv University

Page 2: BioPython Workshop Gershon Celniker Tel Aviv University

IntroductionIntroduction• The Biopython Project is an international association of developers of freely available Python

(http://www.python.org) tools for computational molecular biology. • Python is an object oriented, interpreted, exible language that is becoming increasingly

popular for scientific computing. • Python is easy to learn, has a very clear syntax and can easily be extended with modules.• The Biopython web site (http://www.biopython.org) provides an online resource for

modules, scripts, and web links for developers of Python-based software for bioinformatics use and research.

• Basically, the goal of Biopython is to make it as easy as possible to use Python for bioinformatics by creating high-quality, reusable modules and classes.

• Biopython features include parsers for various Bioinformatics file formats(BLAST, Clustalw, FASTA, Genbank,...), access to online services (NCBI, Expasy, Clustalw, DSSP, MSMS...)

• Basically, we just like to program in Python and want to make it as easy as possible to use Python for bioinformatics by creating high-quality, reusable modules and scripts.

https://github.com/biopython/biopython/tree/master/Doc/examples

Page 3: BioPython Workshop Gershon Celniker Tel Aviv University

IntroductionIntroduction• The full tutorial located here:• http://biopython.org/DIST/docs/tutorial/Tutorial.html

• Example files are located here:• https://github.com/biopython/biopython/tree/master/Doc/examples

Page 4: BioPython Workshop Gershon Celniker Tel Aviv University

BioPython, Lets try it!BioPython, Lets try it!

Page 5: BioPython Workshop Gershon Celniker Tel Aviv University

FASTA formatFASTA format

http://en.wikipedia.org/wiki/FASTA_formatFASTA is pronounced "fast A", and stands for "FAST-All", because it works with any alphabet, an extension of "FAST-P" (protein) and "FAST-N" (nucleotide) alignment.

Page 6: BioPython Workshop Gershon Celniker Tel Aviv University

Lets write our first Lets write our first parsing scriptparsing scriptParsing sequence File formatsCypripedioideae (this is the subfamily of lady slipper orchids). This search gave me only 94 hits, which I saved as a FASTA - ls orchid.fasta

>gi|2765658|emb|Z78533.1|CIZ78533 C.irapeanum 5.8S rRNA gene and ITS1 and ITS2 DNACGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGATGAGACCGTGGAATAAACGATCGAGTGAATCCGGAGGACCGGTGTACTCAGCTCACCGGGGGCATTGCTCCCGTGGTGACCCTGATTTGTTGTTGGG

Notice that the FASTA format does not specify the alphabet, so Bio.SeqIO has defaulted to the rathergeneric SingleLetterAlphabet() rather than something DNA specic.

Page 7: BioPython Workshop Gershon Celniker Tel Aviv University

Lets write our first Lets write our first parsing scriptparsing script

Output:gi|2765658|emb|Z78533.1|CIZ78533Seq('CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGATGAGACCGTGG...CGC', SingleLetterAlphabet())740...gi|2765564|emb|Z78439.1|PBZ78439Seq('CATTGTTGAGATCACATAATAATTGATCGAGTTAATCTGGAGGATCTGTTTACT...GCC', SingleLetterAlphabet())592

Page 8: BioPython Workshop Gershon Celniker Tel Aviv University

Sequence slicingSequence slicing

Output:

gi|2765658|emb|Z78533.1|CIZ78533

Page 9: BioPython Workshop Gershon Celniker Tel Aviv University

GC content exerciseGC content exercise

Output:My seq legnth:32G:9

Page 10: BioPython Workshop Gershon Celniker Tel Aviv University

TranscriptionTranscription

Output:

Page 11: BioPython Workshop Gershon Celniker Tel Aviv University

TranslationTranslation

Output:

Page 12: BioPython Workshop Gershon Celniker Tel Aviv University

Translation tablesTranslation tables

Page 13: BioPython Workshop Gershon Celniker Tel Aviv University

Translation – continued Translation – continued

Page 14: BioPython Workshop Gershon Celniker Tel Aviv University

Retrieving data from the Retrieving data from the netnet

Output:O23729CHS3_BROFIRecName: Full=Chalcone synthase 3; EC=2.3.1.74; AltName: Full=Naringenin-chalcone synthase 3;Seq('MAPAMEEIRQAQRAEGPAAVLAIGTSTPPNALYQADYPDYYFRITKSEHLTELK...GAE', ProteinAlphabet())Length 394['Acyltransferase', 'Flavonoid biosynthesis', 'Transferase']

Page 15: BioPython Workshop Gershon Celniker Tel Aviv University

Parsing data from fasta – Parsing data from fasta – part Bpart B

Page 16: BioPython Workshop Gershon Celniker Tel Aviv University

AlignmentAlignment

Page 17: BioPython Workshop Gershon Celniker Tel Aviv University

BlastBlast

Page 18: BioPython Workshop Gershon Celniker Tel Aviv University

PlotsPlots

Page 19: BioPython Workshop Gershon Celniker Tel Aviv University

Plots - resultPlots - result

Page 20: BioPython Workshop Gershon Celniker Tel Aviv University

Going 3D: The PDB Going 3D: The PDB modulemodule

Bio.

Page 21: BioPython Workshop Gershon Celniker Tel Aviv University

Going 3D: The PDB Going 3D: The PDB modulemodule

Bio.