Introductory Unix

Embed Size (px)

Citation preview

  • 7/31/2019 Introductory Unix

    1/38

    1

    Unix Introduction(with a few bioinformatics examples)

    Philippe GautierMRC - Human Genetics Unit

    Bioinformatics Service

  • 7/31/2019 Introductory Unix

    2/38

    2

    Introductory UNIX for sequence analysis

    http://wikilocal.hgu.mrc.ac.uk/Bioinformatics/wiki/index.php/UNIX

    http://unixhelp.ed.ac.uk/

    http://www.ee.surrey.ac.uk/Teaching/Unix/index.html

    [email protected],

    19 October 2001

    http://wwwlocal.hgu.mrc.ac.uk/Computing/Helpsyst/UNIXhelp/index.htmlhttp://unixhelp.ed.ac.uk/http://www.ee.surrey.ac.uk/Teaching/Unix/index.htmlmailto:[email protected]:[email protected]://www.ee.surrey.ac.uk/Teaching/Unix/index.htmlhttp://unixhelp.ed.ac.uk/http://wwwlocal.hgu.mrc.ac.uk/Computing/Helpsyst/UNIXhelp/index.html
  • 7/31/2019 Introductory Unix

    3/38

  • 7/31/2019 Introductory Unix

    4/38

    4

    Unix access (MRC):

    Unix work station (solaris)

    Pc telnet/Exceed/PuTTY. Connect to a Sunhost/server: glengoyne, coleburn, ladyburn,

    strathisla use a ssh protocol (see computing for help)

    Same username and password as for PCaccess (MRC)

    Environment: command line NOT point andclick.

    Mouse: 3 buttons functions/others variants

  • 7/31/2019 Introductory Unix

    5/38

    5

    Operating systems, shells, ...

    the operating system is the flavor of unix

    you use: Sun Solaris, GNU/Linux, MacOS X

    A shell is a program that waits for you to type

    a command and then executes it.

    2 big families of shells with slight

    differences: the C shells (csh tcsh) and the

    Bourne shell (bash, ksh) echo $SHELL

    5

  • 7/31/2019 Introductory Unix

    6/38

    6

    File systems

    to know where you are: pwd

    Root directory = your H: drive.

    File hierarchy/tree:Root:

    /net/homehost/export/home/usernameor ~username

    Subdirectories:

    /net/homehost/export/data/username/or ~username/data

    net

    homehost

    export

    home data

    user1 user2 user1 user2

  • 7/31/2019 Introductory Unix

    7/38

    7

    Entering commands

    type the command, then return

    tips:

    to go back to previously entered commands,use the up and down arrows

    to auto-complete file names, use the tab key

    if you are stuck within a

    command/process/program, try ctrl-c toterminate it

    7

  • 7/31/2019 Introductory Unix

    8/38

    8

    Unix commands:

    Help/manuals: man Moving around:

    cd, cd ., cd ..

    pwd

    ls, ls -ltr

    more, less, cat

    Manipulating files and directories:

    mv, rm, cp, mkdir, rmdir Running scripts:

    Scriptname (-parameters), with or without fullpath

    Stopping running scripts: control-c

  • 7/31/2019 Introductory Unix

    9/38

    9

    Common commands: lsgogo$ ls

    Pax6 all-hr.aln RepeatMasker

    ensembl2008.ppt ian_repeat WT1

    interactions.txt

    gogo$ ls -l

    total 89520

    drwxr-xr-x 4 gogo staff 136 1 Dec 19:07 Pax6

    drwxr-xr-x@ 43 gogo staff 1462 3 Oct 19:18 RepeatMasker

    drwxr-xr-x 2 gogo staff 68 1 Dec 19:06 WT1

    -rwx------ 1 gogo staff 82593 4 Nov 15:35 all-hr.aln

    -rw-r--r-- 1 gogo staff 6077952 21 Nov 14:07 ensembl2008.ppt

    drwxr-xr-x 31 gogo staff 1054 31 Oct 15:11 ian_repeat

    -rw-r--r--@ 1 gogo staff 4624 21 Oct 14:47 interactions.txt

    gogo$ ls -ltr

    total 89520

    drwxr-xr-x 5 gogo staff 170 3 Oct 18:21 testdir

    drwxr-xr-x@ 43 gogo staff 1462 3 Oct 19:18 RepeatMasker

    -rw-r--r--@ 1 gogo staff 4624 21 Oct 14:47 interactions.txt

    drwxr-xr-x 31 gogo staff 1054 31 Oct 15:11 ian_repeat

    -rwx------ 1 gogo staff 82593 4 Nov 15:35 all-hr.aln

    -rw-r--r-- 1 gogo staff 6077952 21 Nov 14:07 ensembl2008.ppt

    drwxr-xr-x 2 gogo staff 68 1 Dec 19:06 WT1

    drwxr-xr-x 4 gogo staff 136 1 Dec 19:07 Pax6

    9

  • 7/31/2019 Introductory Unix

    10/38

    10

    Common commands: pwd/cdgogo$ pwd

    /Users/gogo

    gogo$ cd Bioinfo/

    gogo$ pwd

    /Users/gogo/Bioinfo

    gogo$ ls

    Pax6 all-hr.aln snp_summary.pages

    RepeatMasker ensembl2008.ppt testdir

    RepeatMasker-open-3-2-6.tar ian_repeatWT1 interactions.txt

    gogo$ cd Pax6/

    gogo$ pwd

    /Users/gogo/Bioinfo/Pax6

    gogo$ cd

    gogo$ pwd

    /Users/gogo

    gogo$ cd Bioinfo/Pax6 or cd /Users/gogo/Bioinfo/Pax6

    gogo$ pwd

    /Users/gogo/Bioinfo/Pax6

    gogo$ cd ../

    gogo$ pwd/Users/gogo/Bioinfo/

    10

  • 7/31/2019 Introductory Unix

    11/38

    11

    Common commands: more, less, cat

    glengoyne% lsfile2 file3 temp

    glengoyne% more file2

    file2 content

    2222222222222222222

    glengoyne% more file3

    file3 content

    3333333333333333

    glengoyne% cat file2 file3

    file2 content

    2222222222222222222

    file3 content

    3333333333333333

    glengoyne% cat file2 file3>file4

    glengoyne% more file4

    file2 content

    2222222222222222222

    file3 content

    3333333333333333

    glengoyne%

    11

  • 7/31/2019 Introductory Unix

    12/38

    12

    Common commands: mv, cp, mkdir

    glengoyne% ls

    ex3list.txt file1 file2 file3 file4

    glengoyne% cp file1 file5

    glengoyne% ls

    ex3list.txt file1 file2 file3 file4

    file5

    glengoyne% mv file1 file6

    glengoyne% ls

    ex3list.txt file2 file3 file4 file5

    file6

    glengoyne% mkdir ../testdir3

    glengoyne% ls ../testdir3

    glengoyne% mv file3 ../testdir3

    glengoyne% cd ../testdir3

    glengoyne% ls

    file3

    12

  • 7/31/2019 Introductory Unix

    13/38

    13

    Editing / creating files

    Unix text editors: nano, emacs, vi

    Microsoft word: save as .txt !!

    grep

    Batch actions

    Use of wildcards foreach loop

    Pipes |

    >

    >>

    Other

    &, top, lpr, acroread

  • 7/31/2019 Introductory Unix

    14/38

    14

    Example: use of wildcard

    14

    Want to copy all ab1 files for exon 3.

  • 7/31/2019 Introductory Unix

    15/38

    15

    Example: use of wildcard

    15

    $ lsA12_R14B12_SOX2_ex3_R_002.ab1 C12_R14B12_SOX2_ex5_R_004.ab1

    E12_R14B12_SOX2_ex3_R_002.ab1

    A12_R14B12_SOX2_ex3_R_002.seq C12_R14B12_SOX2_ex5_R_004.seq E12_R14B12_SOX2_ex3_R_002.seq

    B12_R14B12_SOX2_ex5_R_004.ab1 D12_R14B12_SOX2_ex3_R_002.ab1 F12_R14B12_SOX2_ex3_R_002.ab1

    B12_R14B12_SOX2_ex5_R_004.seq D12_R14B12_SOX2_ex3_R_002.seq F12_R14B12_SOX2_ex3_R_002.seq

    C12_R14B12_SOX2_ex3_R_002.ab1 D12_R14B12_SOX2_ex5_R_006.ab1

    C12_R14B12_SOX2_ex3_R_002.seq D12_R14B12_SOX2_ex5_R_006.seq

    $ ls *ex3*ab1A12_R14B12_SOX2_ex3_R_002.ab1 D12_R14B12_SOX2_ex3_R_002.ab1

    F12_R14B12_SOX2_ex3_R_002.ab1

    C12_R14B12_SOX2_ex3_R_002.ab1 E12_R14B12_SOX2_ex3_R_002.ab1

    Be very careful using wildcard! e.g.: rm * will erase all you files!

  • 7/31/2019 Introductory Unix

    16/38

    16

    other example: use of redirect

    16

    Create a text file containing a list of file namesgogo$ lsA12_R14B12_SOX2_ex3_R_002.ab1 C12_R14B12_SOX2_ex5_R_004.ab1

    E12_R14B12_SOX2_ex3_R_002.ab1

    A12_R14B12_SOX2_ex3_R_002.seq C12_R14B12_SOX2_ex5_R_004.seq E12_R14B12_SOX2_ex3_R_002.seq

    B12_R14B12_SOX2_ex5_R_004.ab1 D12_R14B12_SOX2_ex3_R_002.ab1 F12_R14B12_SOX2_ex3_R_002.ab1

    B12_R14B12_SOX2_ex5_R_004.seq D12_R14B12_SOX2_ex3_R_002.seq F12_R14B12_SOX2_ex3_R_002.seq

    C12_R14B12_SOX2_ex3_R_002.ab1 D12_R14B12_SOX2_ex5_R_006.ab1

    C12_R14B12_SOX2_ex3_R_002.seq D12_R14B12_SOX2_ex5_R_006.seqgogo$ ls *ex3*ab1A12_R14B12_SOX2_ex3_R_002.ab1 D12_R14B12_SOX2_ex3_R_002.ab1

    F12_R14B12_SOX2_ex3_R_002.ab1

    C12_R14B12_SOX2_ex3_R_002.ab1 E12_R14B12_SOX2_ex3_R_002.ab1

    gogo$ ls *ex5*seqB12_R14B12_SOX2_ex5_R_004.seq C12_R14B12_SOX2_ex5_R_004.seq

    D12_R14B12_SOX2_ex5_R_006.seq

    gogo$ ls *ex5*seq>ex5_sequences_list

    gogo$ more ex5_sequences_listB12_R14B12_SOX2_ex5_R_004.seq

    C12_R14B12_SOX2_ex5_R_004.seq

    D12_R14B12_SOX2_ex5_R_006.seq

    h l fil

  • 7/31/2019 Introductory Unix

    17/38

    17

    another example: concatenate files

    17

    Concatenate files into a single one (for e.g., put sequences in a

    unique file for use with an alignment program like Clustal)gogo$ ls *pax6.pepall_pax6.pep lineus_pax6.pep

    saccoglossum_pax6.pep

    branchiostoma_bel_pax6.pep ljap_pax6.pep strongylocentrotus_pax6.pep

    branchiostoma_flo_pax6.pep loligo_pax6.pep

    euprymna_pax6.pep platynereis_pax6.pepgogo$ cat *pax6.pep>all_pax6.pepgogo$

    more all_pax6.pep>gi|117650666|gb|ABK54278.1| Pax6 [Branchiostoma belcheri]

    MGRGHSGVNQLGGVFVNGRPLPDSTRQKIVELAHQGARPCDISRLLQVSNGCVSKILGRYYETGSIRPRA

    IGGSKPRVATPEVVAKIAQFKRECPSIFAWEIRDRLLSEGICTNENIPSVSSINRVLRNLASGEKNTLQS

    LQSADPQMLEKLRLLNGNAWPHPGPWPYPPATAGAPPPQTNGNVTTKKEGDGKLASQILTLHGYQDQGDGSNDDSDEAQARLRLKRKLQRNRTSFTQEQIEALEKEFERTHYPDVFARERLAAKIDLPEARIQVWFSNRR

    AKWRREEKLRNQRRSQDSDSSSPSRIPISSSFSTATMYQPIAPPSAPVMSRSSHAGLTDSYSSLPPVPSF

    SVPGNMAPMPSMQQSREQTSYSCMIPHSTAMTPRGYDSLALGSYNPTHAGHHVTTTHPSHMQAPSMTGHS

    HMTHANGGSAGLISPGVSVPVQVPGAVTEEMTSQPYWPRIQ

    >gi|3204110|emb|CAA11364.1| Pax6 [Branchiostoma floridae]

    MPHKAWTLQRPADEHAQYSPVQADPGHSGVNQLGGVFVGGRPLPDSTRRKIVELAHQGARPCDISRLLQV

    SNGCVSKILGRYYETGSIRPRAIGGSKPRVATPEVVAKIAQFKRECPSIFAWEIRDRLLSEGICTNENIP

    SVSSINRVLRNLASGEKNTLQSLQSADPQMLEKLRLLNGNAWPHPGPWPYPPSTAGAPPPQTNGNVTTKK

    EGDGKLASQILTLHGYQDQGDGSNDDSDEAQARLRLKRKLQRNRTSFTQEQIEALEKEFERTHYPDVFARERLAAKIDLPEARIQVWFSNRRAKWRREEKLRNQRRSQDSDSSSPSRIPISSSFSTATMYQPIAPPSAPV

    MSRSSHAGLTDSYSSLPPVPSFSVPGNMAPMPSMQQSRDQTSYSCMIPHSTAMTPRGYDSLALGSYNPTH

    AGHHVTTTHPSHMQAPSMPGHSHMSHANGGSAGLISPGVSVPVQVPGAVTEEMTSQPYWPRIQ

    >gi|21667881|gb|AAM74161.1|AF513712_1 Pax-6 protein [Euprymna scolopes]

    MKNTTENHQHSVSHDTNSTSLNSSGASPNEQSPTAWKWSTNPVITEESPRDKPTGHSGVNQLGGVFVNGR

    PLPDSTRQRIVELAHSGARPCDISRILQVSNGCVSKILGRYYETGSIRPRAIGGSKPRVATPEVVQKIAQ

  • 7/31/2019 Introductory Unix

    18/38

    18

    Example of foreach/for loop

    C shells

    bash

    18

    glengoyne% echo $SHELL/bin/tcsh

    glengoyne% lsblue1 blue2 blue3 blue4 blue5 blue6 green1 green2 green3 green4

    green5 green6glengoyne% foreach colour (blue*)foreach? cp $colour $colour.copyforeach? end

    glengoyne% lsblue1 blue2.copy blue4 blue5.copy green1 green4

    blue1.copy blue3 blue4.copy blue6 green2 green5

    blue2 blue3.copy blue5 blue6.copy green3 green6

    gogo$ echo $SHELL/bin/bashgogo$ lsblue1 blue3 blue5 green1

    green3 green5 testdir3

    blue2 blue4 blue6 green2 green4 green6gogo$ for colour inblue*;do> cp $colour $colour.copy>done

    gogo$ lsblue1 blue2.copy blue4 blue5.copy green1 green4

    testdir3

    blue1.copy blue3 blue4.copy blue6 green2 green5

    blue2 blue3.copy blue5 blue6.copy green3 green6

  • 7/31/2019 Introductory Unix

    19/38

    19

    Next step: scripting

    using the shell or a programminglanguage like perl

    example: read and make a fasta file froman excel/tab-delimited text probe list

    19

  • 7/31/2019 Introductory Unix

    20/38

    20

    Simple perl script

    #!/usr/bin/perl -w

    open (LIST, "illumina_probes_table.txt");

    open (SEQ, ">illumina_probes_seq.fasta");

    while ($line = ) {

    chomp $line;

    @fields = split(/\t/,$line);

    $id=$fields[1];

    $locus{$id}=$fields[0];

    $seq{$id}=$fields[9];

    print SEQ ">$id $locus{$id}\n$seq{$id}\n\n";

    }

    20

    open the input file, create an outputfile and associate them withfilhandles

    use perl

    for each line in the input file, do thefollowing

    Split the line based on a separator(here a tabulation \t) and assign thevalues to a variable called fields.fields[0]=column 1,fields[9]=column10, etc

    print the probe id, locus name andsequence in the output file

  • 7/31/2019 Introductory Unix

    21/38

    21

    Perl script output:

    21

    >106940692

    0610005C13RIKTTTGCAGTTCCACCCCTTACCTAGGGTGTGCGGAAGCTG

    GGGCGCCCCGT>580022

    0610005I04GCTTCTGGATAGGAGAGGGGATATTGTATTGATTCACCCCAT

    CTTGTGTC>2940601

    0610006I08RIKGACCTCCCTGGCTGTTGCAGCCTTGTCCAGACCTCTGAG

    CCGAGTACCTG>103440070

    0610006L08RIKGCCCCGCTTCTTCAGCATAACACACGAGCTCTCAGATCT

    TCCAATGGAGT>102260551

    0610007C21RIKCACCACCTCGGGGGTCTTGTGGACACTTGGTTCAGGAGT

    GGACTCGTACT>102370333

    0610007C21RIKGCGCACCCTCATCTATCAAGTCCCGGCCATGTCGCAGCC

    AAGAAGATTTA

  • 7/31/2019 Introductory Unix

    22/38

    22

    Bioinformatics common

    programs EMBOSS suite

    Clustalw

    Blast RepeatMasker

    Genewise

    Etc,

  • 7/31/2019 Introductory Unix

    23/38

    23

    Sequences format

    - different formats, but the most universal one isFASTA

    >seq1 other comments if necessarycagtcatgctagctagctagctagctagctagctagtgtggtgtgggggtagctgatcgat

    Or

    >seq1

    CGATGCATGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTGACGTACGTAGCTAGCTGAGCTAGCATGC

    >SEQ22

    ATCGATCGTAGCTACGACGGTGGGTGATGTCGTAGTTATATAAAATGCTGATGCTAGCTAGCTAGTCGA

    E b it

  • 7/31/2019 Introductory Unix

    24/38

    24

    Emboss suite

    Help: online: http://emboss.sourceforge.net/

    In Unix:

    - wossname + keywordgogo$ wossname alignFinds programs by keywords in their short descriptionSEARCH FOR

    'ALIGN'aligncopy Reads and writes alignmentsaligncopypair Reads and writes

    pairs from alignmentscons Create a consensus sequence from a multiplealignmentconsambig Create an ambiguous consensus sequence from a multiple

    alignmentdiffseq Compare and report features of two similar

    sequencesdistmat Create a distance matrix from a multiple sequence

    alignmentdotmatcher Draw a threshold dotplot of two sequencesdotpath

    Draw a non-overlapping wordmatch dotplot of two sequencesdottup Displays a

    wordmatch dotplot of two sequencesedialign Local multiple alignment of

    sequencesemma Multiple sequence alignment (ClustalW wrapper)est2genome

    Align EST sequences to genomic DNA sequenceextractalign Extract regions from a

    sequence alignment

    http://emboss.sourceforge.net/docs/http://emboss.sourceforge.net/docs/http://emboss.sourceforge.net/docs/
  • 7/31/2019 Introductory Unix

    25/38

    25

    Help on a specific program:

    Program_name -help -verbose

    philippe2-lt:ian_repeat gogo$ seqret -help -verbose Standard (Mandatory) qualifiers:

    [-sequence] seqall (Gapped) sequence(s) filename and optional

    format, or reference (input USA) [-outseq] seqoutall

    [.] Sequence set(s) filename and

    optional format (output USA) Additional (Optional) qualifiers: (none) Advanced

    (Unprompted) qualifiers: -feature boolean Use feature information -

    firstonly boolean [N] Read one sequence and stop Associated qualifiers:"-sequence" associated qualifiers -sbegin1 integer Start of each

    sequence to be used -send1 integer End of each sequence to be used

    -sreverse1 boolean Reverse (if DNA) -sask1 boolean Ask

    for begin/end/reverse -snucleotide1 boolean Sequence is nucleotide -

    sprotein1 boolean Sequence is protein -slower1 boolean

    Make lower case -supper1 boolean Make upper case -sformat1

    string Input sequence format -sdbname1 string Database name -

    sid1 string Entryname -ufo1 string UFO features

    -fformat1 string Features format -fopenfile1 stringFeatures file name "-outseq" associated qualifiers -osformat2 string

    Output seq format -osextension2 string File name extension -osname2

    string Base file name -osdirectory2 string Output directory

  • 7/31/2019 Introductory Unix

    26/38

    26

    Emboss Examples:

    seqret remap

    transeq

    revseq

  • 7/31/2019 Introductory Unix

    27/38

    27

    EMBOSS examples

    seqret

    glengoyne% more testseq.fasta

    >testseq

    NNNNNNNNNNNNNNNGNNNNCTCCNTTTNNTCGTTTTTCTTTGAAATTTCTCCCCCCTCCAGTTCGCTGTCCGGCCCTCACAT

    TGTGAGAGGGGCAGTGTGCCGTTAATGGCCGTGCCGGGCACCGGGCCGCTCTGGTAGTGCTGGGACATGTGAAGTCTGCTGGG

    GCGGCGGGTTCCGGCACCTCGGCGCCGGGGAGATACATGCTGATCATGTCCCGGAGGTCCCCGGCCTGGCAGGGCGCCCTGGATGGGAGGAAGAGGTAACCACAGGGGGGCTGGAGCTGGCCTCGGACTTGACCACCGAACCCATGGAGCCAAGAGCCATGCCAGG

    GTGCCCTGCTGCGAGTAGGACATGCTGTAGGTGGGCGAGCCGTTCATGTAGGTCTGCGAGCTGGTCATGGAGTTGTACTGCAG

    GCGCTCACGTCGTANCGGTGCATGGGCTGCATCTGCGCTGCGCCGTGCGCATTGAGGCCCGGGTGCTGCGGGTAGCCCAGCTG

    TCCTGCATCATGCTGTAGCTGCCGTTGCTCCAGCCGTTCATGTGCGCGTAACTGTCCATGCNNNN

    glengoyne% seqret -sbegin 50 -send 100 testseq.fasta testseq_50_100.fasta

    Reads and writes (returns) sequences

    glengoyne% more testseq_50_100.fasta>testseq

    CTCCCCCCTCCAGTTCGCTGTCCGGCCCTCACATGTGTGAGAGGGGCAGTG

    27

    EMBOSS examples

  • 7/31/2019 Introductory Unix

    28/38

    28

    EMBOSS examples

    remap

    glengoyne% remap -sbegin 50 -send 1000 testseq.fastaDisplay a sequence with restriction cut sites, translation etc..

    Comma separated enzyme list [all]:

    Minimum recognition site length [4]: 8

    Output file [testseq.remap]:

    glengoyne% more testseq.remap

    testseq

    OliIMslI

    \

    CTCCCCCCTCCAGTTCGCTGTCCGGCCCTCACATGTGTGAGAGGGGCAGTGTGCCGTTAA

    50 60 70 80 90 100

    |----:----|----:----|----:----|----:----|----:----|----:----

    GAGGGGGGAGGTCAAGCGACAGGCCGGGAGTGTACACACTCTCCCCGTCACACGGCAATT

    /

    MslI

    OliI

    P P S S S L S G P H M C E R G S V P L M

    L P P P V R C P A L T C V R G A V C R *

    S P L Q F A V R P S H V * E G Q C A V N

    |----:----|----:----|----:----|----:----|----:----|----:----

    E G G E L E S D P G * M H S L P L T G N

    E G R W N A T R G E C T H S P C H A T L

    R G G G T R Q G A R V H T L P A T H R * 28

  • 7/31/2019 Introductory Unix

    29/38

    29

    EMBOSS examples

    transeqglengoyne% transeq -sbegin 50 -send 100 -frame 3 testseq.fasta

    Translate nucleic acid sequences

    Output sequence [testseq.pep]:

    glengoyne% more testseq.pep

    >testseq_3PPSSSLSGPHMCERGS

    revseqglengoyne% revseq -sbegin 50 -send 100 testseq.fasta testseq.rev

    Reverse and complement a sequence

    29

    l l

  • 7/31/2019 Introductory Unix

    30/38

    30

    clustalw

    Input format: multiple fasta

    clustalw seqfilename.fasta

    Creates a .aln file

    muscle Input format: multiple fasta

    muscle -in seqfilename.fasta -

    out outputname -clw

    Creates a clustal-like output file

    Other: T-coffee, dialign, etc

  • 7/31/2019 Introductory Unix

    31/38

    31

    clustalw

    muscle

    t_coffee

  • 7/31/2019 Introductory Unix

    32/38

    32

    Blast - local How to create your own blast database?

    - formatdb

    - e.g: formatdb -i allseq.fasta -p F

    glengoyne% formatdb hformatdb 2.2.11 arguments: -t Title for

    database file [String] Optional -i Input file(s) for formatting

    [File In] Optional -l Logfile name: [File Out] Optional default

    = formatdb.log -p Type of file T - protein F -

    nucleotide [T/F] Optional default = T -o Parse options T

    - True: Parse SeqId and create indexes. F - False: Do notparse SeqId. Do not create indexes. [T/F] Optional default = F

  • 7/31/2019 Introductory Unix

    33/38

    33

    Blast - local

    blastall

    compulsory parameters:-p blastn/blastp/tblastn/blastx/tblastx

    -d database

    -i input file

    -o output file

    optional parameters:-e Expectation value (E) [Real] default = 10.0

    -m alignment view options:

    0 = pairwise,8 = tabular,

    9 = tabular with comment lines

    e.g: blastall -p blastn -d allseq.fasta -i mycandidate.fasta -o blastoutput.txt

  • 7/31/2019 Introductory Unix

    34/38

    34

    RepeatMasker

    RepeatMasker seq.fasta

    RepeatMasker -species mouse seq.fasta

    Output:seq.fasta.masked (N where repeat)

    seq.fasta.out:SW perc perc perc query position in query matching repeat position in repeatscore div.

    del. ins. sequence begin end (left) repeat class/family begin end (left) ID 226 15.8 7.3 11.

    Chromosome 38 119 (389882) C MLT1L LTR/MaLR (4) 611 533 1 376 6.7 21.7 0.0Chromosome 200 259 (389742) C MADE1 DNA/Mariner (7) 73 1 2 416 23.4 8.4 8.4

    Chromosome 310 536 (389465) C MLT1L LTR/MaLR (348) 267 41 1 268 17.7 0.0 3.4

    Chromosome 545 797 (389204) + AluJo SINE/Alu 6 302 (0) 4 2118 12.8 1.6 0.3

    Chromosome 882 1187 (388814) + AluSg SINE/Alu 1 310 (0) 6 309 28.3 5.9 7.7

    Chromosome 1202 1487 (388514) + L1ME4a LINE/L1 5841 6121 (0) 7 2047 22.2 11.0 4.2

    Chromosome 1659 1837 (388164) + MLT1G LTR/MaLR 10 175 (415) 8 2002 13.1 0.3 0.3

    Chromosome 1838 2127 (387874) C AluSx SINE/Alu (22) 290 1 9 2047 22.2 11.0 4.2

    Chromosome 2128 2491 (387510) + MLT1G LTR/MaLR 175 590 (0) 8 208 21.4 0.0 0.0

    Chromosome 2853 2894 (387107) + MIR SINE/MIR 70 111 (151) 10 218 6.9 0.0 6.9

    Chromosome 3083 3140 (386861) + (TA)n Simple_repeat 1 54 (0) 11

  • 7/31/2019 Introductory Unix

    35/38

    35

    R package

    open source scripting platform forstatistical computing

    versions for Windows, OS X, Linux.

    Graphical interface, terminal andprogrammatic interfaces

    wealth of documentation onlineas well as local support

    35

    R k

  • 7/31/2019 Introductory Unix

    36/38

    36

    R package

    36

  • 7/31/2019 Introductory Unix

    37/38

    37

    collection of packages providing tools forthe analysis and comprehension of genomic data

    great many algorithms/ techniques peer-reviewed(5,670 Google Scholar hits)

    developed in response to large data volumesand non-trivial analysis involvedwith microarrays

    http://www.bioconductor.org

    number of packages within the bioconductor project by year

    http://www.bioconductor.org/http://www.bioconductor.org/http://www.bioconductor.org/
  • 7/31/2019 Introductory Unix

    38/38

    33

    Selection of packages:

    package description

    biomaRt interface to BioMart databases (incl. Ensembl)

    affy methods for analysing Affymetric oligonucleotide microarrays

    ShortRead methods for analysing high-throughput short-read sequencingdata (incl. Solexa)

    ppiStats protein-protein interaction statistical methods

    beadarray quality assessment and low-level analysis of Illumina BeadArrays

    fbat family-based association tests for genetic data

    limma linear models for detecting differential expression in microarray data

    See introduction course in January,contact Vicky Sabine for moreinformation

    mailto:[email protected]:[email protected]