Upload
scott-mcdonald
View
216
Download
2
Embed Size (px)
Citation preview
Supplementary Materials of Article
A long-read transcriptome assembly of cotton (Gossypium hirsutum) and intraspecific SNP discovery
Hamid Ashrafi1*, Amanda M. Hulse-Kemp2, Fei Wang2, Joshua Udall3, Don Jones4, Marta Matvienko5, Keithanne Mockaitis6, David M. Stelly2, Allen Van Deynze1
1- University of California-Davis, Department of Plant Sciences and Seed Biotechnology Center, One Shields Ave, Davis CA 956162- Texas A&M University, Department of Soil and Crop Sciences, College Station, TX 778433- Brigham Young University, Plant and Wildlife Science Department, Provo, UT 840624- Cotton Incorporated, Cary, NC 275135- University of California-Davis, Genome Center, One Shields Ave, Davis CA 95616 Current Address: CLC bio, Cambridge, Massachusetts, United States of America6- Department of Biology, Indiana University, 915 E. Third St., Bloomington IN 47405
Fig. S1
0 150 300 450 600 750 900 105012001350150016501800195021002250240025502700285030003150330034503600375039000
100
200
300
400
500
600
Number of sequences with length X
Length of Sequences
Num
ber o
f Sequences
N50=1100
Max = 13,697Min = 101
Fig. S2
No BLAST No BLAST hit No Mapping No Annotation Annotated Total0
10000
20000
30000
40000
50000
60000
70000
80000
Data DistributionNum
ber o
f Sequences
Distribution of GO terms derived from each database
UniProtKB TAIR GR_protein FB MGI SGN ZFIN RGD WB0
500000
1000000
1500000
2000000
2500000
Distribution of GO terms derived from each database
Database
Num
ber o
f GO Terms
Fig. S3
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 >250
2000
4000
6000
8000
10000
12000
Dirstribution of Number of Sequences with GO Terms
Number of GOs
Num
ber o
f Sequences
Fig. S4
0
10
20
30
40
50
60
70
80
90
100
011
523
034
546
057
569
080
592
010
3511
5012
6513
8014
9516
1017
2518
4019
5520
7021
8523
0024
1525
3026
4527
6028
7529
9031
0532
2033
3534
5035
6536
8037
9539
1040
2541
4042
5543
7044
8546
0047
1548
3049
4550
6051
7552
9054
0555
2056
3557
5058
6559
8060
9562
10
Percen
t Ann
otated
Sequence Length
Percent of Anotated Sequences Relative to Sequence Length
Fig. S5
- 50,000 100,000 150,000 200,000 250,000 300,000
IEA
RCA
IDA
ISS
ND
IMP
ISM
IEP
TAS
IPI
IGI
IBA
NAS
IC
ISO
ISA
SequencesEviden
ce Codes
Fig. S6
Vitis viniferaGlycine max
Populus trichocarpaArabidopsis thaliana
Cucumis sativusPrunus persica
Ricinus communisSolanum lycopersicum
Fragaria vescaMedicago truncatula
Oryza sativaArabidopsis lyrata
Zea maysSorghum bicolorCapsella rubella
Brachypodium distachyonLotus japonicus
Hordeum vulgareGossypium hirsutum
Aegilops tauschiiPicea sitchensisTriticum urartu
Nicotiana tabacumSolanum tuberosum
Malus xSelaginella moellendorffii
Physcomitrella patensThellungiella halophila
unknownothers
0 20000 40000 60000 80000 100000 120000 140000 160000
Number of BLAST Hits to Available GenBank Species Sequences
Number of hits
Species
Fig. S7
Vitis vinifera
Populus trichocarpa
Glycine max
Fragaria vesca
Medicago truncatula
Arabidopsis thaliana
Oryza sativa
Lotus japonicus
Sorghum bicolor
Jatropha curcas
Hevea brasiliensis
Nicotiana tabacum
Gossypium raimondii
Citrus sinensis
Brachypodium distachyon
0 2000 4000 6000 8000 10000 12000 14000
Number of BLAST Top-Hits to Available Sequences Available in GenBank
Number of Hits
Species
Fig. S8
Fig. S9
Fig. S10
Fig. S11
Chr 1
Fig. S12-a. Alignment of TM-1 454 and EST sequences to the genome of G. raimondii. a-d) alignment of G. raimondii genes, CDS, Gene based expression and translated region expression to its own genome sequence, respectively. e), f) depicts the left and right had side of break points when TM-1 Reads mapped to G. raimondii genome. g) large InDels h) structural variants.
a
b
c
d
e
f
g
h
Chr 2
Fig. S12-b
Chr 3
Fig. S12-c
Chr 4
Fig. S12-d
Chr 5
Fig. S12-e
Chr 6
Fig. S12-f
Chr 7
Fig. S12-g
Chr 8
Fig. S12-h
Chr 9
Fig. S12-i
Chr 10
Fig. S12-j
Chr 11
Fig. S12-k
Chr 12
Fig. S12-l
Chr 13
Fig. S12-m