Upload
koen
View
16
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Regular Meeting February 26, 2009. Mark Borodovsky Ivan Antonov. Topics. What have been done Results for adjacent genes using bigger gap length Results for adjacent genes using RBS site threshold Future work. What have been done. A small bug in calculating gene statistics found - PowerPoint PPT Presentation
Citation preview
Regular Meeting
February 26, 2009
Mark BorodovskyIvan Antonov
GATech 2
Topics
1. What have been done
2. Results for adjacent genes using bigger gap length
3. Results for adjacent genes using RBS site threshold
4. Future work
GATech 3
What have been done
1. A small bug in calculating gene statistics found
2. Bigger threshold on gap length in adjacent genes is used
3. RBS site score threshold is implemented
Bug-free statistics
GATech 5
Typical genes distribution (old)
400 typical genes with
FS
End/start
missing
End missin
g
Start missin
g
End/start
present1 27 15 357
Adjacent genes
167
Gene overla
p190
Green squares – where “All
others” principle was used
Gap len <60
114
Gap len >60
53
GATech 6
Typical genes distribution (new)
400 typical genes with
FS
End/start
missing
End missin
g
Start missin
g
End/start
present1 34 26 339
Adjacent genes
149
Gene overla
p190
Green squares – where “All
others” principle was used
Gap len <60
114
Gap len >60
35
Reducing number ofFalse Negatives
among adjacent genes
by increasing upper bound threshold on gap length
Choosing upper bound threshold
GATech 8
0
5
10
15
20
25
30
35
0 -1
010
-20
20 -
3030
-40
40 -
5050
-60
60 -
7070
-80
80 -
9090
-10
010
0 -1
1011
0 -1
2012
0 -1
3013
0 -1
4014
0 -1
5015
0 -1
6016
0 -1
7017
0 -1
8018
0 -1
9019
0 -2
0020
0 -2
1021
0 -2
2022
0 -2
3023
0 -2
4024
0 -2
5025
0 -2
6026
0 -2
7027
0 -2
8028
0 -2
9029
0 -3
00>
300
Num
adj
acen
t ge
nes
Gap length
Gap lengths in all 149 FS adjacent genes
Old Threshol
d 60
New Threshol
d 16029 FS adjacent
genes more
GATech 9
FS genes distribution
400 typical genes with
FS
End/start
missing
End missin
g
Start missin
g
End/start
present1 34 26 339
Adjacent genes
149
Gene overla
p190
Green squares – where “All
others” principle was used
Gap len <160
143
Gap len >160
6
GATech 10
FSMark-GM prediction
GeneMark Output
Gene Overlap
s
Adjacent Genes
366 (190)
1238 (143)
256 (145)
418 (103)
FSMark applied
Numbers of FS genes are in
brackets
Reducing number ofFalse Positives among
adjacent genes
by introducing threshold on maximum value of RBS site score
GATech 12
Downstream gene RBS site score distribution
0
100
200
300
400
500
600
700
-2-1
.8-1
.6-1
.4-1
.2 -1-0
.8-0
.6-0
.4-0
.2 00.
20.
40.
60.
8 11.
21.
41.
61.
8 22.
22.
42.
62.
8 33.
23.
43.
63.
8 4
Freq
uenc
y
RBS site score
TP_sum
FP_sum
GATech 13
Downstream gene RBS site score distribution
0
50
100
150
200
250
300
350
400
450
500
-2 -1.8 -1.6 -1.4 -1.2 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Freq
uenc
y
RBS site score
TP_sum
FP_sum
GATech 14
FS genes distribution
400 typical genes with
FS
End/start
missing
End missin
g
Start missin
g
End/start
present1 34 26 339
Adjacent genes
149
Gene overla
p190
Green squares – where “All
others” principle was used
Gap len <160
126
Gap len >16023
GATech 15
FSMark-GM prediction
GeneMark Output
Gene Overlap
s
Adjacent Genes
176
FSMark applied
FPTP
190
111 145
501 126
131 92
GATech 16
Today’s FSMark-GM performance
New approach
Ovlp AdjOthe
r Total Prev. Total
TP
Gap 160nt
145 103 0 248225
RBS score 145 92 0 237
FPGap 160 111 315 0 426
394RBS score 111 131 0 242
FN
Gap 160 45 40 67 152175
RBS score 45 34 84 163
GATech 17
Conclusions
• Bigger gap threshold slightly increased number of True Positives in adjacent genes
• RBS site score threshold significantly decreased number of False positives in adjacent genes
GATech 18
Future work
• Try to understand why do we have so many genes with end missing
• Take closer look at FSMark results on adjacent genes
•What else?