Upload
mantosh-kumar
View
220
Download
0
Embed Size (px)
Citation preview
7/31/2019 Introductory Unix
1/38
1
Unix Introduction(with a few bioinformatics examples)
Philippe GautierMRC - Human Genetics Unit
Bioinformatics Service
7/31/2019 Introductory Unix
2/38
2
Introductory UNIX for sequence analysis
http://wikilocal.hgu.mrc.ac.uk/Bioinformatics/wiki/index.php/UNIX
http://unixhelp.ed.ac.uk/
http://www.ee.surrey.ac.uk/Teaching/Unix/index.html
19 October 2001
http://wwwlocal.hgu.mrc.ac.uk/Computing/Helpsyst/UNIXhelp/index.htmlhttp://unixhelp.ed.ac.uk/http://www.ee.surrey.ac.uk/Teaching/Unix/index.htmlmailto:[email protected]:[email protected]://www.ee.surrey.ac.uk/Teaching/Unix/index.htmlhttp://unixhelp.ed.ac.uk/http://wwwlocal.hgu.mrc.ac.uk/Computing/Helpsyst/UNIXhelp/index.html7/31/2019 Introductory Unix
3/38
7/31/2019 Introductory Unix
4/38
4
Unix access (MRC):
Unix work station (solaris)
Pc telnet/Exceed/PuTTY. Connect to a Sunhost/server: glengoyne, coleburn, ladyburn,
strathisla use a ssh protocol (see computing for help)
Same username and password as for PCaccess (MRC)
Environment: command line NOT point andclick.
Mouse: 3 buttons functions/others variants
7/31/2019 Introductory Unix
5/38
5
Operating systems, shells, ...
the operating system is the flavor of unix
you use: Sun Solaris, GNU/Linux, MacOS X
A shell is a program that waits for you to type
a command and then executes it.
2 big families of shells with slight
differences: the C shells (csh tcsh) and the
Bourne shell (bash, ksh) echo $SHELL
5
7/31/2019 Introductory Unix
6/38
6
File systems
to know where you are: pwd
Root directory = your H: drive.
File hierarchy/tree:Root:
/net/homehost/export/home/usernameor ~username
Subdirectories:
/net/homehost/export/data/username/or ~username/data
net
homehost
export
home data
user1 user2 user1 user2
7/31/2019 Introductory Unix
7/38
7
Entering commands
type the command, then return
tips:
to go back to previously entered commands,use the up and down arrows
to auto-complete file names, use the tab key
if you are stuck within a
command/process/program, try ctrl-c toterminate it
7
7/31/2019 Introductory Unix
8/38
8
Unix commands:
Help/manuals: man Moving around:
cd, cd ., cd ..
pwd
ls, ls -ltr
more, less, cat
Manipulating files and directories:
mv, rm, cp, mkdir, rmdir Running scripts:
Scriptname (-parameters), with or without fullpath
Stopping running scripts: control-c
7/31/2019 Introductory Unix
9/38
9
Common commands: lsgogo$ ls
Pax6 all-hr.aln RepeatMasker
ensembl2008.ppt ian_repeat WT1
interactions.txt
gogo$ ls -l
total 89520
drwxr-xr-x 4 gogo staff 136 1 Dec 19:07 Pax6
drwxr-xr-x@ 43 gogo staff 1462 3 Oct 19:18 RepeatMasker
drwxr-xr-x 2 gogo staff 68 1 Dec 19:06 WT1
-rwx------ 1 gogo staff 82593 4 Nov 15:35 all-hr.aln
-rw-r--r-- 1 gogo staff 6077952 21 Nov 14:07 ensembl2008.ppt
drwxr-xr-x 31 gogo staff 1054 31 Oct 15:11 ian_repeat
-rw-r--r--@ 1 gogo staff 4624 21 Oct 14:47 interactions.txt
gogo$ ls -ltr
total 89520
drwxr-xr-x 5 gogo staff 170 3 Oct 18:21 testdir
drwxr-xr-x@ 43 gogo staff 1462 3 Oct 19:18 RepeatMasker
-rw-r--r--@ 1 gogo staff 4624 21 Oct 14:47 interactions.txt
drwxr-xr-x 31 gogo staff 1054 31 Oct 15:11 ian_repeat
-rwx------ 1 gogo staff 82593 4 Nov 15:35 all-hr.aln
-rw-r--r-- 1 gogo staff 6077952 21 Nov 14:07 ensembl2008.ppt
drwxr-xr-x 2 gogo staff 68 1 Dec 19:06 WT1
drwxr-xr-x 4 gogo staff 136 1 Dec 19:07 Pax6
9
7/31/2019 Introductory Unix
10/38
10
Common commands: pwd/cdgogo$ pwd
/Users/gogo
gogo$ cd Bioinfo/
gogo$ pwd
/Users/gogo/Bioinfo
gogo$ ls
Pax6 all-hr.aln snp_summary.pages
RepeatMasker ensembl2008.ppt testdir
RepeatMasker-open-3-2-6.tar ian_repeatWT1 interactions.txt
gogo$ cd Pax6/
gogo$ pwd
/Users/gogo/Bioinfo/Pax6
gogo$ cd
gogo$ pwd
/Users/gogo
gogo$ cd Bioinfo/Pax6 or cd /Users/gogo/Bioinfo/Pax6
gogo$ pwd
/Users/gogo/Bioinfo/Pax6
gogo$ cd ../
gogo$ pwd/Users/gogo/Bioinfo/
10
7/31/2019 Introductory Unix
11/38
11
Common commands: more, less, cat
glengoyne% lsfile2 file3 temp
glengoyne% more file2
file2 content
2222222222222222222
glengoyne% more file3
file3 content
3333333333333333
glengoyne% cat file2 file3
file2 content
2222222222222222222
file3 content
3333333333333333
glengoyne% cat file2 file3>file4
glengoyne% more file4
file2 content
2222222222222222222
file3 content
3333333333333333
glengoyne%
11
7/31/2019 Introductory Unix
12/38
12
Common commands: mv, cp, mkdir
glengoyne% ls
ex3list.txt file1 file2 file3 file4
glengoyne% cp file1 file5
glengoyne% ls
ex3list.txt file1 file2 file3 file4
file5
glengoyne% mv file1 file6
glengoyne% ls
ex3list.txt file2 file3 file4 file5
file6
glengoyne% mkdir ../testdir3
glengoyne% ls ../testdir3
glengoyne% mv file3 ../testdir3
glengoyne% cd ../testdir3
glengoyne% ls
file3
12
7/31/2019 Introductory Unix
13/38
13
Editing / creating files
Unix text editors: nano, emacs, vi
Microsoft word: save as .txt !!
grep
Batch actions
Use of wildcards foreach loop
Pipes |
>
>>
Other
&, top, lpr, acroread
7/31/2019 Introductory Unix
14/38
14
Example: use of wildcard
14
Want to copy all ab1 files for exon 3.
7/31/2019 Introductory Unix
15/38
15
Example: use of wildcard
15
$ lsA12_R14B12_SOX2_ex3_R_002.ab1 C12_R14B12_SOX2_ex5_R_004.ab1
E12_R14B12_SOX2_ex3_R_002.ab1
A12_R14B12_SOX2_ex3_R_002.seq C12_R14B12_SOX2_ex5_R_004.seq E12_R14B12_SOX2_ex3_R_002.seq
B12_R14B12_SOX2_ex5_R_004.ab1 D12_R14B12_SOX2_ex3_R_002.ab1 F12_R14B12_SOX2_ex3_R_002.ab1
B12_R14B12_SOX2_ex5_R_004.seq D12_R14B12_SOX2_ex3_R_002.seq F12_R14B12_SOX2_ex3_R_002.seq
C12_R14B12_SOX2_ex3_R_002.ab1 D12_R14B12_SOX2_ex5_R_006.ab1
C12_R14B12_SOX2_ex3_R_002.seq D12_R14B12_SOX2_ex5_R_006.seq
$ ls *ex3*ab1A12_R14B12_SOX2_ex3_R_002.ab1 D12_R14B12_SOX2_ex3_R_002.ab1
F12_R14B12_SOX2_ex3_R_002.ab1
C12_R14B12_SOX2_ex3_R_002.ab1 E12_R14B12_SOX2_ex3_R_002.ab1
Be very careful using wildcard! e.g.: rm * will erase all you files!
7/31/2019 Introductory Unix
16/38
16
other example: use of redirect
16
Create a text file containing a list of file namesgogo$ lsA12_R14B12_SOX2_ex3_R_002.ab1 C12_R14B12_SOX2_ex5_R_004.ab1
E12_R14B12_SOX2_ex3_R_002.ab1
A12_R14B12_SOX2_ex3_R_002.seq C12_R14B12_SOX2_ex5_R_004.seq E12_R14B12_SOX2_ex3_R_002.seq
B12_R14B12_SOX2_ex5_R_004.ab1 D12_R14B12_SOX2_ex3_R_002.ab1 F12_R14B12_SOX2_ex3_R_002.ab1
B12_R14B12_SOX2_ex5_R_004.seq D12_R14B12_SOX2_ex3_R_002.seq F12_R14B12_SOX2_ex3_R_002.seq
C12_R14B12_SOX2_ex3_R_002.ab1 D12_R14B12_SOX2_ex5_R_006.ab1
C12_R14B12_SOX2_ex3_R_002.seq D12_R14B12_SOX2_ex5_R_006.seqgogo$ ls *ex3*ab1A12_R14B12_SOX2_ex3_R_002.ab1 D12_R14B12_SOX2_ex3_R_002.ab1
F12_R14B12_SOX2_ex3_R_002.ab1
C12_R14B12_SOX2_ex3_R_002.ab1 E12_R14B12_SOX2_ex3_R_002.ab1
gogo$ ls *ex5*seqB12_R14B12_SOX2_ex5_R_004.seq C12_R14B12_SOX2_ex5_R_004.seq
D12_R14B12_SOX2_ex5_R_006.seq
gogo$ ls *ex5*seq>ex5_sequences_list
gogo$ more ex5_sequences_listB12_R14B12_SOX2_ex5_R_004.seq
C12_R14B12_SOX2_ex5_R_004.seq
D12_R14B12_SOX2_ex5_R_006.seq
h l fil
7/31/2019 Introductory Unix
17/38
17
another example: concatenate files
17
Concatenate files into a single one (for e.g., put sequences in a
unique file for use with an alignment program like Clustal)gogo$ ls *pax6.pepall_pax6.pep lineus_pax6.pep
saccoglossum_pax6.pep
branchiostoma_bel_pax6.pep ljap_pax6.pep strongylocentrotus_pax6.pep
branchiostoma_flo_pax6.pep loligo_pax6.pep
euprymna_pax6.pep platynereis_pax6.pepgogo$ cat *pax6.pep>all_pax6.pepgogo$
more all_pax6.pep>gi|117650666|gb|ABK54278.1| Pax6 [Branchiostoma belcheri]
MGRGHSGVNQLGGVFVNGRPLPDSTRQKIVELAHQGARPCDISRLLQVSNGCVSKILGRYYETGSIRPRA
IGGSKPRVATPEVVAKIAQFKRECPSIFAWEIRDRLLSEGICTNENIPSVSSINRVLRNLASGEKNTLQS
LQSADPQMLEKLRLLNGNAWPHPGPWPYPPATAGAPPPQTNGNVTTKKEGDGKLASQILTLHGYQDQGDGSNDDSDEAQARLRLKRKLQRNRTSFTQEQIEALEKEFERTHYPDVFARERLAAKIDLPEARIQVWFSNRR
AKWRREEKLRNQRRSQDSDSSSPSRIPISSSFSTATMYQPIAPPSAPVMSRSSHAGLTDSYSSLPPVPSF
SVPGNMAPMPSMQQSREQTSYSCMIPHSTAMTPRGYDSLALGSYNPTHAGHHVTTTHPSHMQAPSMTGHS
HMTHANGGSAGLISPGVSVPVQVPGAVTEEMTSQPYWPRIQ
>gi|3204110|emb|CAA11364.1| Pax6 [Branchiostoma floridae]
MPHKAWTLQRPADEHAQYSPVQADPGHSGVNQLGGVFVGGRPLPDSTRRKIVELAHQGARPCDISRLLQV
SNGCVSKILGRYYETGSIRPRAIGGSKPRVATPEVVAKIAQFKRECPSIFAWEIRDRLLSEGICTNENIP
SVSSINRVLRNLASGEKNTLQSLQSADPQMLEKLRLLNGNAWPHPGPWPYPPSTAGAPPPQTNGNVTTKK
EGDGKLASQILTLHGYQDQGDGSNDDSDEAQARLRLKRKLQRNRTSFTQEQIEALEKEFERTHYPDVFARERLAAKIDLPEARIQVWFSNRRAKWRREEKLRNQRRSQDSDSSSPSRIPISSSFSTATMYQPIAPPSAPV
MSRSSHAGLTDSYSSLPPVPSFSVPGNMAPMPSMQQSRDQTSYSCMIPHSTAMTPRGYDSLALGSYNPTH
AGHHVTTTHPSHMQAPSMPGHSHMSHANGGSAGLISPGVSVPVQVPGAVTEEMTSQPYWPRIQ
>gi|21667881|gb|AAM74161.1|AF513712_1 Pax-6 protein [Euprymna scolopes]
MKNTTENHQHSVSHDTNSTSLNSSGASPNEQSPTAWKWSTNPVITEESPRDKPTGHSGVNQLGGVFVNGR
PLPDSTRQRIVELAHSGARPCDISRILQVSNGCVSKILGRYYETGSIRPRAIGGSKPRVATPEVVQKIAQ
7/31/2019 Introductory Unix
18/38
18
Example of foreach/for loop
C shells
bash
18
glengoyne% echo $SHELL/bin/tcsh
glengoyne% lsblue1 blue2 blue3 blue4 blue5 blue6 green1 green2 green3 green4
green5 green6glengoyne% foreach colour (blue*)foreach? cp $colour $colour.copyforeach? end
glengoyne% lsblue1 blue2.copy blue4 blue5.copy green1 green4
blue1.copy blue3 blue4.copy blue6 green2 green5
blue2 blue3.copy blue5 blue6.copy green3 green6
gogo$ echo $SHELL/bin/bashgogo$ lsblue1 blue3 blue5 green1
green3 green5 testdir3
blue2 blue4 blue6 green2 green4 green6gogo$ for colour inblue*;do> cp $colour $colour.copy>done
gogo$ lsblue1 blue2.copy blue4 blue5.copy green1 green4
testdir3
blue1.copy blue3 blue4.copy blue6 green2 green5
blue2 blue3.copy blue5 blue6.copy green3 green6
7/31/2019 Introductory Unix
19/38
19
Next step: scripting
using the shell or a programminglanguage like perl
example: read and make a fasta file froman excel/tab-delimited text probe list
19
7/31/2019 Introductory Unix
20/38
20
Simple perl script
#!/usr/bin/perl -w
open (LIST, "illumina_probes_table.txt");
open (SEQ, ">illumina_probes_seq.fasta");
while ($line = ) {
chomp $line;
@fields = split(/\t/,$line);
$id=$fields[1];
$locus{$id}=$fields[0];
$seq{$id}=$fields[9];
print SEQ ">$id $locus{$id}\n$seq{$id}\n\n";
}
20
open the input file, create an outputfile and associate them withfilhandles
use perl
for each line in the input file, do thefollowing
Split the line based on a separator(here a tabulation \t) and assign thevalues to a variable called fields.fields[0]=column 1,fields[9]=column10, etc
print the probe id, locus name andsequence in the output file
7/31/2019 Introductory Unix
21/38
21
Perl script output:
21
>106940692
0610005C13RIKTTTGCAGTTCCACCCCTTACCTAGGGTGTGCGGAAGCTG
GGGCGCCCCGT>580022
0610005I04GCTTCTGGATAGGAGAGGGGATATTGTATTGATTCACCCCAT
CTTGTGTC>2940601
0610006I08RIKGACCTCCCTGGCTGTTGCAGCCTTGTCCAGACCTCTGAG
CCGAGTACCTG>103440070
0610006L08RIKGCCCCGCTTCTTCAGCATAACACACGAGCTCTCAGATCT
TCCAATGGAGT>102260551
0610007C21RIKCACCACCTCGGGGGTCTTGTGGACACTTGGTTCAGGAGT
GGACTCGTACT>102370333
0610007C21RIKGCGCACCCTCATCTATCAAGTCCCGGCCATGTCGCAGCC
AAGAAGATTTA
7/31/2019 Introductory Unix
22/38
22
Bioinformatics common
programs EMBOSS suite
Clustalw
Blast RepeatMasker
Genewise
Etc,
7/31/2019 Introductory Unix
23/38
23
Sequences format
- different formats, but the most universal one isFASTA
>seq1 other comments if necessarycagtcatgctagctagctagctagctagctagctagtgtggtgtgggggtagctgatcgat
Or
>seq1
CGATGCATGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTGACGTACGTAGCTAGCTGAGCTAGCATGC
>SEQ22
ATCGATCGTAGCTACGACGGTGGGTGATGTCGTAGTTATATAAAATGCTGATGCTAGCTAGCTAGTCGA
E b it
7/31/2019 Introductory Unix
24/38
24
Emboss suite
Help: online: http://emboss.sourceforge.net/
In Unix:
- wossname + keywordgogo$ wossname alignFinds programs by keywords in their short descriptionSEARCH FOR
'ALIGN'aligncopy Reads and writes alignmentsaligncopypair Reads and writes
pairs from alignmentscons Create a consensus sequence from a multiplealignmentconsambig Create an ambiguous consensus sequence from a multiple
alignmentdiffseq Compare and report features of two similar
sequencesdistmat Create a distance matrix from a multiple sequence
alignmentdotmatcher Draw a threshold dotplot of two sequencesdotpath
Draw a non-overlapping wordmatch dotplot of two sequencesdottup Displays a
wordmatch dotplot of two sequencesedialign Local multiple alignment of
sequencesemma Multiple sequence alignment (ClustalW wrapper)est2genome
Align EST sequences to genomic DNA sequenceextractalign Extract regions from a
sequence alignment
http://emboss.sourceforge.net/docs/http://emboss.sourceforge.net/docs/http://emboss.sourceforge.net/docs/7/31/2019 Introductory Unix
25/38
25
Help on a specific program:
Program_name -help -verbose
philippe2-lt:ian_repeat gogo$ seqret -help -verbose Standard (Mandatory) qualifiers:
[-sequence] seqall (Gapped) sequence(s) filename and optional
format, or reference (input USA) [-outseq] seqoutall
[.] Sequence set(s) filename and
optional format (output USA) Additional (Optional) qualifiers: (none) Advanced
(Unprompted) qualifiers: -feature boolean Use feature information -
firstonly boolean [N] Read one sequence and stop Associated qualifiers:"-sequence" associated qualifiers -sbegin1 integer Start of each
sequence to be used -send1 integer End of each sequence to be used
-sreverse1 boolean Reverse (if DNA) -sask1 boolean Ask
for begin/end/reverse -snucleotide1 boolean Sequence is nucleotide -
sprotein1 boolean Sequence is protein -slower1 boolean
Make lower case -supper1 boolean Make upper case -sformat1
string Input sequence format -sdbname1 string Database name -
sid1 string Entryname -ufo1 string UFO features
-fformat1 string Features format -fopenfile1 stringFeatures file name "-outseq" associated qualifiers -osformat2 string
Output seq format -osextension2 string File name extension -osname2
string Base file name -osdirectory2 string Output directory
7/31/2019 Introductory Unix
26/38
26
Emboss Examples:
seqret remap
transeq
revseq
7/31/2019 Introductory Unix
27/38
27
EMBOSS examples
seqret
glengoyne% more testseq.fasta
>testseq
NNNNNNNNNNNNNNNGNNNNCTCCNTTTNNTCGTTTTTCTTTGAAATTTCTCCCCCCTCCAGTTCGCTGTCCGGCCCTCACAT
TGTGAGAGGGGCAGTGTGCCGTTAATGGCCGTGCCGGGCACCGGGCCGCTCTGGTAGTGCTGGGACATGTGAAGTCTGCTGGG
GCGGCGGGTTCCGGCACCTCGGCGCCGGGGAGATACATGCTGATCATGTCCCGGAGGTCCCCGGCCTGGCAGGGCGCCCTGGATGGGAGGAAGAGGTAACCACAGGGGGGCTGGAGCTGGCCTCGGACTTGACCACCGAACCCATGGAGCCAAGAGCCATGCCAGG
GTGCCCTGCTGCGAGTAGGACATGCTGTAGGTGGGCGAGCCGTTCATGTAGGTCTGCGAGCTGGTCATGGAGTTGTACTGCAG
GCGCTCACGTCGTANCGGTGCATGGGCTGCATCTGCGCTGCGCCGTGCGCATTGAGGCCCGGGTGCTGCGGGTAGCCCAGCTG
TCCTGCATCATGCTGTAGCTGCCGTTGCTCCAGCCGTTCATGTGCGCGTAACTGTCCATGCNNNN
glengoyne% seqret -sbegin 50 -send 100 testseq.fasta testseq_50_100.fasta
Reads and writes (returns) sequences
glengoyne% more testseq_50_100.fasta>testseq
CTCCCCCCTCCAGTTCGCTGTCCGGCCCTCACATGTGTGAGAGGGGCAGTG
27
EMBOSS examples
7/31/2019 Introductory Unix
28/38
28
EMBOSS examples
remap
glengoyne% remap -sbegin 50 -send 1000 testseq.fastaDisplay a sequence with restriction cut sites, translation etc..
Comma separated enzyme list [all]:
Minimum recognition site length [4]: 8
Output file [testseq.remap]:
glengoyne% more testseq.remap
testseq
OliIMslI
\
CTCCCCCCTCCAGTTCGCTGTCCGGCCCTCACATGTGTGAGAGGGGCAGTGTGCCGTTAA
50 60 70 80 90 100
|----:----|----:----|----:----|----:----|----:----|----:----
GAGGGGGGAGGTCAAGCGACAGGCCGGGAGTGTACACACTCTCCCCGTCACACGGCAATT
/
MslI
OliI
P P S S S L S G P H M C E R G S V P L M
L P P P V R C P A L T C V R G A V C R *
S P L Q F A V R P S H V * E G Q C A V N
|----:----|----:----|----:----|----:----|----:----|----:----
E G G E L E S D P G * M H S L P L T G N
E G R W N A T R G E C T H S P C H A T L
R G G G T R Q G A R V H T L P A T H R * 28
7/31/2019 Introductory Unix
29/38
29
EMBOSS examples
transeqglengoyne% transeq -sbegin 50 -send 100 -frame 3 testseq.fasta
Translate nucleic acid sequences
Output sequence [testseq.pep]:
glengoyne% more testseq.pep
>testseq_3PPSSSLSGPHMCERGS
revseqglengoyne% revseq -sbegin 50 -send 100 testseq.fasta testseq.rev
Reverse and complement a sequence
29
l l
7/31/2019 Introductory Unix
30/38
30
clustalw
Input format: multiple fasta
clustalw seqfilename.fasta
Creates a .aln file
muscle Input format: multiple fasta
muscle -in seqfilename.fasta -
out outputname -clw
Creates a clustal-like output file
Other: T-coffee, dialign, etc
7/31/2019 Introductory Unix
31/38
31
clustalw
muscle
t_coffee
7/31/2019 Introductory Unix
32/38
32
Blast - local How to create your own blast database?
- formatdb
- e.g: formatdb -i allseq.fasta -p F
glengoyne% formatdb hformatdb 2.2.11 arguments: -t Title for
database file [String] Optional -i Input file(s) for formatting
[File In] Optional -l Logfile name: [File Out] Optional default
= formatdb.log -p Type of file T - protein F -
nucleotide [T/F] Optional default = T -o Parse options T
- True: Parse SeqId and create indexes. F - False: Do notparse SeqId. Do not create indexes. [T/F] Optional default = F
7/31/2019 Introductory Unix
33/38
33
Blast - local
blastall
compulsory parameters:-p blastn/blastp/tblastn/blastx/tblastx
-d database
-i input file
-o output file
optional parameters:-e Expectation value (E) [Real] default = 10.0
-m alignment view options:
0 = pairwise,8 = tabular,
9 = tabular with comment lines
e.g: blastall -p blastn -d allseq.fasta -i mycandidate.fasta -o blastoutput.txt
7/31/2019 Introductory Unix
34/38
34
RepeatMasker
RepeatMasker seq.fasta
RepeatMasker -species mouse seq.fasta
Output:seq.fasta.masked (N where repeat)
seq.fasta.out:SW perc perc perc query position in query matching repeat position in repeatscore div.
del. ins. sequence begin end (left) repeat class/family begin end (left) ID 226 15.8 7.3 11.
Chromosome 38 119 (389882) C MLT1L LTR/MaLR (4) 611 533 1 376 6.7 21.7 0.0Chromosome 200 259 (389742) C MADE1 DNA/Mariner (7) 73 1 2 416 23.4 8.4 8.4
Chromosome 310 536 (389465) C MLT1L LTR/MaLR (348) 267 41 1 268 17.7 0.0 3.4
Chromosome 545 797 (389204) + AluJo SINE/Alu 6 302 (0) 4 2118 12.8 1.6 0.3
Chromosome 882 1187 (388814) + AluSg SINE/Alu 1 310 (0) 6 309 28.3 5.9 7.7
Chromosome 1202 1487 (388514) + L1ME4a LINE/L1 5841 6121 (0) 7 2047 22.2 11.0 4.2
Chromosome 1659 1837 (388164) + MLT1G LTR/MaLR 10 175 (415) 8 2002 13.1 0.3 0.3
Chromosome 1838 2127 (387874) C AluSx SINE/Alu (22) 290 1 9 2047 22.2 11.0 4.2
Chromosome 2128 2491 (387510) + MLT1G LTR/MaLR 175 590 (0) 8 208 21.4 0.0 0.0
Chromosome 2853 2894 (387107) + MIR SINE/MIR 70 111 (151) 10 218 6.9 0.0 6.9
Chromosome 3083 3140 (386861) + (TA)n Simple_repeat 1 54 (0) 11
7/31/2019 Introductory Unix
35/38
35
R package
open source scripting platform forstatistical computing
versions for Windows, OS X, Linux.
Graphical interface, terminal andprogrammatic interfaces
wealth of documentation onlineas well as local support
35
R k
7/31/2019 Introductory Unix
36/38
36
R package
36
7/31/2019 Introductory Unix
37/38
37
collection of packages providing tools forthe analysis and comprehension of genomic data
great many algorithms/ techniques peer-reviewed(5,670 Google Scholar hits)
developed in response to large data volumesand non-trivial analysis involvedwith microarrays
http://www.bioconductor.org
number of packages within the bioconductor project by year
http://www.bioconductor.org/http://www.bioconductor.org/http://www.bioconductor.org/7/31/2019 Introductory Unix
38/38
33
Selection of packages:
package description
biomaRt interface to BioMart databases (incl. Ensembl)
affy methods for analysing Affymetric oligonucleotide microarrays
ShortRead methods for analysing high-throughput short-read sequencingdata (incl. Solexa)
ppiStats protein-protein interaction statistical methods
beadarray quality assessment and low-level analysis of Illumina BeadArrays
fbat family-based association tests for genetic data
limma linear models for detecting differential expression in microarray data
See introduction course in January,contact Vicky Sabine for moreinformation
mailto:[email protected]:[email protected]