19
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu Nimble Perl Programming Using Scriptome Yannick Pouliot, PhD Bioresearch Informationist Lane Medical Library & Knowledge Management Center 1/22/2009

Ni mble Perl Programming Using Scriptome

  • Upload
    lily

  • View
    45

  • Download
    0

Embed Size (px)

DESCRIPTION

Ni mble Perl Programming Using Scriptome. Yannick Pouliot, PhD Bioresearch Informationist Lane Medical Library & Knowledge Management Center 1/22/2009. Objectives. Determining whether Scriptome can … Enable you to perform operations otherwise difficult/time-consuming/error-prone? - PowerPoint PPT Presentation

Citation preview

Page 1: Ni mble Perl Programming Using  Scriptome

Lane Medical Library & Knowledge Management Centerhttp://lane.stanford.edu

Nimble Perl Programming Using Scriptome

Yannick Pouliot, PhDBioresearch Informationist

Lane Medical Library & Knowledge Management Center

1/22/2009

Page 2: Ni mble Perl Programming Using  Scriptome

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

2

ObjectivesDetermining whether Scriptome can …

1. Enable you to perform operations otherwise difficult/time-consuming/error-prone?

2. Help you learn Perl?

And don’t worry: This experiment won’t hurt a bit!

Also, we’ll be using anonymous polling to determine whether you’re happy with the material and speed of delivery …

Page 3: Ni mble Perl Programming Using  Scriptome

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

3

So What Is Scriptome?

Scriptome is a resident Perl program that performs various data manipulation tasks useful to biologists

Originally developed by Harvard’s FAS Center for Systems Biology Maintained and extended by lots more volunteers

not associated with Harvard

Page 4: Ni mble Perl Programming Using  Scriptome

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

4

Why Bother With Scriptome? Code is visible, enabling learning on how to

do things in Perl … or not Can handle arbitrarily large files

No size limitations, e.g., Excel Free; runs on everything: PC, Mac, Linux It’s programmatic!

Much faster than manual operations You can string operations together and save

these in e.g. a .bat file

Page 5: Ni mble Perl Programming Using  Scriptome

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

5

How Do You Use Scriptome? You tell Scriptome which function you want it to

perform (more later) You can also string Scriptome functions into a

protocol Input: Scriptome operates on text files

No binary files, but you could add that capability yourself E.g., process Excel files in native form using Perl modules,

e.g., ParseExcel

Output: command line or write into another file

Page 6: Ni mble Perl Programming Using  Scriptome

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

6

Scriptome: Pick Your Flavor

http://sysbio.harvard.edu/csb/resources/computational/scriptome/

http://lane.stanford.edu/howto/index.html?id=_1257

Page 7: Ni mble Perl Programming Using  Scriptome

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

7

Installing Scriptome - Windows1. Download Scriptome_exe.tar.gz using this link:

http://sysbio.harvard.edu/csb/resources/computational/scriptome/bin/Scriptome_exe.tar.gz.

→ Final location: I suggest C:/Program Files/Scriptome

2. Create a directory named “Scriptome”3. Decompress Scriptome_exe.tar.gz by double-clicking

→ Notice the four files inside

3. Update the PATH variableadd this string at the END of the contents of the PATH variable:

;C:\Program Files\Scriptome\Scriptome;C:\Program Files\Scriptome\ScriptPack;C:\Program Files\Scriptome\Scriptome.bat;C:\Program Files\Scriptome\ScriptPack.bat

Page 8: Ni mble Perl Programming Using  Scriptome

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

8

Scriptome Usage1. Using a specific tool:

Scriptome flags toolname [input_filenames] [> output_filename]

Example Scriptome -t change_fasta_to_tab LONGhmcad.fst

2. Finding a tool by type:Scriptome -t tooltype

where tooltype = Calc Choose Sort Fetch Merge Change

Example Scriptome -t Calc

Let’s examine each area briefly before going over specifics…

Page 9: Ni mble Perl Programming Using  Scriptome

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

9

Polling Time: How’s the speed?

1: Too fast

2. Too slow

3. More or less OK

4. I feel nauseous

Page 10: Ni mble Perl Programming Using  Scriptome

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

10

Examples and noteworthy tools

Page 11: Ni mble Perl Programming Using  Scriptome

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

11

Calc Tool Examples - 1

Compute column sums: Scriptome -t calc_col_sum SubjectData1.tab

→ select columns to add

IMPORTANT: column numbers start at 0, not 1 Note visible Perl code → easy to modify,

expand perl -e "$col=1; while(<>) { s/\r?\n//; @F=split /\t/, $_; $sum += $F[$col];}warn qq~\nSum of column $col for $. lines\n\n~;print qq~$sum\n~" file.tab

Page 12: Ni mble Perl Programming Using  Scriptome

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

12

Calc Tool Examples - 2

Compute row sums: Scriptome -t calc_row_sum

SubjectData1.tab

→ enter 1 for column 1, 2 for column 2, etc perl -e "

@cols=(1, 2, 3); while(<>) { s/\r?\n//; @F=split /\t/, $_; $sum = 0; foreach $col (@cols) {

$sum += $F[$col] }; print qq~$_\t$sum\n~;}warn qq~\nSum of columns @cols for each line ($. lines)\n\n~" in.tab

Page 13: Ni mble Perl Programming Using  Scriptome

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

13

Change Tool Examples - 1

Create tab-delimited file from FASTA file:

Scriptome -t change_fasta_to_tab LONGhmcad.fst > LONGhmcad.fst.tab

→ change_fasta_to_tab is an important tool because many Scriptome tools use tab-delimited files

perl -e "$count=0;$len=0;while(<>) { s/\r?\n//; s/\t/ /g; if (s/^>//) {

if ($. != 1) { print qq~\n~}s/ |$/\t/;$count++;$_ .= qq~\t~;

} else {

s/ //g;$len += length($_)

} print $_;}print qq~\n~;warn qq~\nConverted $count FASTA records in $. lines to tabular format\nTotal sequence length: $len\n\n~;" seqs.fna

Page 14: Ni mble Perl Programming Using  Scriptome

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

14

Change Tool Examples - 2

Change rows to columns or vice versa:

Scriptome -t change_transpose_table SubjectData1.tab

Note: change_transpose_table operates on tab-delimited files

Page 15: Ni mble Perl Programming Using  Scriptome

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

15

Change Tool Examples - 3

Create tab-delimited file from FASTA file:

Scriptome -t change_bio_format_to_bio_format LONGhmcad.fst enter ‘fasta’ as input format (no quotes)enter ‘genbank’ as output format (no quotes)

change_bio_format_to_bio_format addresses the common problem of converting formats

Important: requires Bioperl to be installed

perl -MBio::SeqIO -e "$informat= qq~genbank~;$outformat= qq~fasta~; $count = 0;for $infile (@ARGV) { $in = Bio::SeqIO->newFh(-file => $infile , -format => $informat); $out = Bio::SeqIO->newFh(-format => $outformat); while (<$in>) {

print $out $_;$count++;

}}warn qq~Translated $count sequences from $informat to $outformat format\n~" myseqs.genbank > myseqs.fasta

* Notice anything interesting? *

Page 16: Ni mble Perl Programming Using  Scriptome

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

16

ConclusionsScriptome is … A good solution for manipulating medium to

large data files quickly and reliably A way to learn Perl in a “real” context (no toy

problems) Able to perform a wide range of tasks, from

simple, generic file manipulations to bio-specific complex tasks

Page 17: Ni mble Perl Programming Using  Scriptome

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

17

Resources For Perl help, see resources in workshop

description in Lane’s Perl Programming for Biologists

Some recommended titles:

Page 18: Ni mble Perl Programming Using  Scriptome

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

18

Polling Time: Do you think Scriptome will be useful to your research?

1. Definitely

2. Likely

3. Not likely

4. No way

5. What’s the question again?

Page 19: Ni mble Perl Programming Using  Scriptome

Lane Medical Library & Knowledge Management Centerhttp://lane.stanford.edu