49
Trinity College Dublin, The University of Dublin A Brief Introduction to Scientific Programming with Python Karsten Hokamp, PhD TCD Bioinformatics Support Team TCD, 26/08/2015

Trinity College Dublin, The University of Dublin A Brief Introduction to Scientific Programming with Python Karsten Hokamp, PhD TCD Bioinformatics Support

Embed Size (px)

Citation preview

Trinity College Dublin, The University of Dublin

A Brief Introduction to Scientific Programming with Python

Karsten Hokamp, PhDTCD Bioinformatics Support Team

TCD, 26/08/2015

Trinity College Dublin, The University of Dublin

Overview

• Programming

• First Python script/program

• Why Python?

• Bioinformatics examples

• Additional resources

• Outlook

Trinity College Dublin, The University of Dublin

What is programming and why bother?

Data processing

Automation

Combination of programs for analysis pipelines

More control and flexibility

Better understanding of how programs work

Trinity College Dublin, The University of Dublin

Programming Concepts

Turn into a very meticulous problem solver

Break problems into small details

Keep it variable

Give very precise instructions

Trinity College Dublin, The University of Dublin

Programming Concepts

"human" recipe

Trinity College Dublin, The University of Dublin

Programming Concepts

"computerised" recipe

Trinity College Dublin, The University of Dublin

Mac for Windows users

The main differences:

cmd instead of ctrl (e.g. cmd-C for copying)

right-click mouse: ctrl-click

# character: alt-3

switch between applications: cmd-tab

Spotlight (top right) for finding files/programs

Apple symbol (top left) for logging out

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

IDLE: Integrated DeveLopment Environment

open through Spotlight

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

IDLE: Integrated DeveLopment Environment

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

IDLE: Integrated DeveLopment Environment

Alternatively: open through Finder

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

IDLE: Integrated DeveLopment Environment

interactive Python console

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

IDLE: Integrated DeveLopment Environment

simple Python statement

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

IDLE: Integrated DeveLopment Environment

user input

output

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

IDLE: Integrated DeveLopment Environment

try a few simplenumeric operations

user input

output

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

IDLE: Integrated DeveLopment Environment

repeat/combine previous commands

by clicking into them and hitting return(use left/right arrows

and delete to edit them)

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

IDLE: Integrated DeveLopment Environment

Console vs Editor

Console Editor

interactive requires extra click for running

great for trying out code additional IDLE functionality

not suited for long scripts suited for long scripts

no saving of code allows to save code

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

IDLE: Writing Python Scripts

open a new file

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

IDLE: Writing Python Scripts

write some code

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

IDLE: Writing Python Scripts

run your code shortcut: F5

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

IDLE: Writing Python Scripts

save file first

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

IDLE: Writing Python Scripts

specify a file name

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

IDLE: Writing Python Scripts

write more codeIDLE provides help

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

IDLE: Writing Python Scripts

save and run:cmd-S then F5

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

IDLE: Writing Python Scripts

make it personal

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

IDLE: Writing Python Scripts

keep going

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

Python vs Perl

the equivalentin Perl

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

Python vs Perl

the equivalentin Perl

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

Python vs Perl

• fewer special characters• indentation enforced• more user-friendly functions

Python Perl

Trinity College Dublin, The University of Dublin

Why Python?

easy to learn great for beginners

enforces clean coding great for teachers

comes with IDE avoids command-line usage

object-orientated code reuse and recycling

very popular many peers

BioPython many bioinformatics modules

Trinity College Dublin, The University of Dublin

Simple Bioinformatics Example

built-in function 'len'

Trinity College Dublin, The University of Dublin

Simple Bioinformatics Example

built-in function 'set'

Trinity College Dublin, The University of Dublin

Simple Bioinformatics Example

built-in functions 'sorted' and 'set'

Trinity College Dublin, The University of Dublin

Simple Bioinformatics Example

string method 'count'

Trinity College Dublin, The University of Dublin

Simple Bioinformatics Example

string method 'upper'

Trinity College Dublin, The University of Dublin

Basic sequence manipulation Fetch records from databases Multiple sequence alignment (Clustal, Muscle) Sequence similarity search (Blast) Working with motifs: MEME, Jaspar, Transfac Phylogenetics Clustering Visualisation

Trinity College Dublin, The University of Dublin

Parsing GenBank records:

from Bio import SeqIO

record = SeqIO.read("AE014613.1.gb", "genbank")

record.description 'Salmonella enterica subsp. enterica serovar Typhi Ty2, complete genome.'

len(record.features) 9086

Trinity College Dublin, The University of Dublin

Parsing sequence records:

from Bio import SeqIO

for entry in SeqIO.parse("tlr4_protein.fa", "fasta") :

print(entry.description)

print(len(entry), 'bp')gi|765368240|gb|AJR32867.1| TLR4 [Gallus gallus]843 bpgi|111414439|gb|ABH09759.1| toll-like receptor 4 [Bos taurus]841 bpgi|6175873|gb|AAF05316.1|AF177765_1 toll-like receptor 4 [Homo sapiens]839 bp…

Trinity College Dublin, The University of Dublin

Graphics:

Chromosomes colour-coded by GC content (Bioinformatics with Python Cookbook)

Trinity College Dublin, The University of Dublin

Graphics:

Coloured phylogenetic tree from Ebola sequences (Bioinformatics with Python Cookbook)

Trinity College Dublin, The University of Dublin

Additional Resources

https://store.continuum.io/cshop/anaconda/

Trinity College Dublin, The University of Dublin

Visualisations with Matplotlib

http://matplotlib.org/gallery.html

Trinity College Dublin, The University of Dublin

Examples

http://scikit-learn.org

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

Scikit-learn – Machine Learning in Python

• Machine Learning: PCA of Iris data set

http://scikit-learn.org/stable/auto_examples/decomposition/plot_pca_iris.html

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

Python Help

Trinity College Dublin, The University of Dublin

Online courses

http://biopython.org/DIST/docs/tutorial/Tutorial.html

http://dowell.colorado.edu/education-python.html

http://www.pasteur.fr/formation/infobio/python

https://www.codecademy.com/tracks/python

http://anh.cs.luc.edu/python/hands-on/

https://www.coursera.org

Trinity College Dublin, The University of Dublin

Books

Trinity College Dublin, The University of DublinTrinity College Dublin, The University of Dublin

Conclusions

• You have been briefly introduced to Python and IDLE.

• You have learnt about programming concepts.

• You have seen examples of what can be accomplished through Python.

• Topics of an extensive Python course:

• Coding in Python – variables, scope, functions…

• Bioinformatics with BioPython

• Automated biological data analysis – your interests!

Trinity College Dublin, The University of Dublin

Thank You!

http://bioinf.gen.tcd.ie/workshops/python

Trinity College Dublin, The University of Dublin

Don't forget to log out!