18
Regular Meeting February 26, 2009 Mark Borodovsky Ivan Antonov

Regular Meeting February 26, 2009

  • Upload
    koen

  • View
    16

  • Download
    0

Embed Size (px)

DESCRIPTION

Regular Meeting February 26, 2009. Mark Borodovsky Ivan Antonov. Topics. What have been done Results for adjacent genes using bigger gap length Results for adjacent genes using RBS site threshold Future work. What have been done. A small bug in calculating gene statistics found - PowerPoint PPT Presentation

Citation preview

Page 1: Regular Meeting February 26, 2009

Regular Meeting

February 26, 2009

Mark BorodovskyIvan Antonov

Page 2: Regular Meeting February 26, 2009

GATech 2

Topics

1. What have been done

2. Results for adjacent genes using bigger gap length

3. Results for adjacent genes using RBS site threshold

4. Future work

Page 3: Regular Meeting February 26, 2009

GATech 3

What have been done

1. A small bug in calculating gene statistics found

2. Bigger threshold on gap length in adjacent genes is used

3. RBS site score threshold is implemented

Page 4: Regular Meeting February 26, 2009

Bug-free statistics

Page 5: Regular Meeting February 26, 2009

GATech 5

Typical genes distribution (old)

400 typical genes with

FS

End/start

missing

End missin

g

Start missin

g

End/start

present1 27 15 357

Adjacent genes

167

Gene overla

p190

Green squares – where “All

others” principle was used

Gap len <60

114

Gap len >60

53

Page 6: Regular Meeting February 26, 2009

GATech 6

Typical genes distribution (new)

400 typical genes with

FS

End/start

missing

End missin

g

Start missin

g

End/start

present1 34 26 339

Adjacent genes

149

Gene overla

p190

Green squares – where “All

others” principle was used

Gap len <60

114

Gap len >60

35

Page 7: Regular Meeting February 26, 2009

Reducing number ofFalse Negatives

among adjacent genes

by increasing upper bound threshold on gap length

Page 8: Regular Meeting February 26, 2009

Choosing upper bound threshold

GATech 8

0

5

10

15

20

25

30

35

0 -1

010

-20

20 -

3030

-40

40 -

5050

-60

60 -

7070

-80

80 -

9090

-10

010

0 -1

1011

0 -1

2012

0 -1

3013

0 -1

4014

0 -1

5015

0 -1

6016

0 -1

7017

0 -1

8018

0 -1

9019

0 -2

0020

0 -2

1021

0 -2

2022

0 -2

3023

0 -2

4024

0 -2

5025

0 -2

6026

0 -2

7027

0 -2

8028

0 -2

9029

0 -3

00>

300

Num

adj

acen

t ge

nes

Gap length

Gap lengths in all 149 FS adjacent genes

Old Threshol

d 60

New Threshol

d 16029 FS adjacent

genes more

Page 9: Regular Meeting February 26, 2009

GATech 9

FS genes distribution

400 typical genes with

FS

End/start

missing

End missin

g

Start missin

g

End/start

present1 34 26 339

Adjacent genes

149

Gene overla

p190

Green squares – where “All

others” principle was used

Gap len <160

143

Gap len >160

6

Page 10: Regular Meeting February 26, 2009

GATech 10

FSMark-GM prediction

GeneMark Output

Gene Overlap

s

Adjacent Genes

366 (190)

1238 (143)

256 (145)

418 (103)

FSMark applied

Numbers of FS genes are in

brackets

Page 11: Regular Meeting February 26, 2009

Reducing number ofFalse Positives among

adjacent genes

by introducing threshold on maximum value of RBS site score

Page 12: Regular Meeting February 26, 2009

GATech 12

Downstream gene RBS site score distribution

0

100

200

300

400

500

600

700

-2-1

.8-1

.6-1

.4-1

.2 -1-0

.8-0

.6-0

.4-0

.2 00.

20.

40.

60.

8 11.

21.

41.

61.

8 22.

22.

42.

62.

8 33.

23.

43.

63.

8 4

Freq

uenc

y

RBS site score

TP_sum

FP_sum

Page 13: Regular Meeting February 26, 2009

GATech 13

Downstream gene RBS site score distribution

0

50

100

150

200

250

300

350

400

450

500

-2 -1.8 -1.6 -1.4 -1.2 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

Freq

uenc

y

RBS site score

TP_sum

FP_sum

Page 14: Regular Meeting February 26, 2009

GATech 14

FS genes distribution

400 typical genes with

FS

End/start

missing

End missin

g

Start missin

g

End/start

present1 34 26 339

Adjacent genes

149

Gene overla

p190

Green squares – where “All

others” principle was used

Gap len <160

126

Gap len >16023

Page 15: Regular Meeting February 26, 2009

GATech 15

FSMark-GM prediction

GeneMark Output

Gene Overlap

s

Adjacent Genes

176

FSMark applied

FPTP

190

111 145

501 126

131 92

Page 16: Regular Meeting February 26, 2009

GATech 16

Today’s FSMark-GM performance

New approach

Ovlp AdjOthe

r Total Prev. Total

TP

Gap 160nt

145 103 0 248225

RBS score 145 92 0 237

FPGap 160 111 315 0 426

394RBS score 111 131 0 242

FN

Gap 160 45 40 67 152175

RBS score 45 34 84 163

Page 17: Regular Meeting February 26, 2009

GATech 17

Conclusions

• Bigger gap threshold slightly increased number of True Positives in adjacent genes

• RBS site score threshold significantly decreased number of False positives in adjacent genes

Page 18: Regular Meeting February 26, 2009

GATech 18

Future work

• Try to understand why do we have so many genes with end missing

• Take closer look at FSMark results on adjacent genes

•What else?