15

Aliya Sadeque BIOC 599 Supervisory Committee Meeting

  • Upload
    jacie

  • View
    52

  • Download
    0

Embed Size (px)

DESCRIPTION

Aliya Sadeque BIOC 599 Supervisory Committee Meeting. Wednesday December 19, 2007. Outline. About me Thesis project blueprint Course selection. Curriculum Vitae. Queen’s University . Bachelor of Science (Honours) in Biochemistry. Minor in Computing. Graduated May, 2007. - PowerPoint PPT Presentation

Citation preview

Page 1: Aliya Sadeque BIOC 599 Supervisory Committee Meeting
Page 2: Aliya Sadeque BIOC 599 Supervisory Committee Meeting

Aliya SadequeBIOC 599Supervisory Committee Meeting Wednesday December 19, 2007.

Page 3: Aliya Sadeque BIOC 599 Supervisory Committee Meeting

Outline

About me Thesis project blueprint Course selection

Page 4: Aliya Sadeque BIOC 599 Supervisory Committee Meeting

Curriculum Vitae

Queen’s University.Bachelor of Science (Honours) in Biochemistry. Minor in Computing.Graduated May, 2007

Page 5: Aliya Sadeque BIOC 599 Supervisory Committee Meeting

Previous Coursework Undergraduate Level

Biochemistry: Proteins and Enzymes Physical Biochemistry Metabolism Molecular Biology Introductory Biochemistry Laboratory Protein Structure and Function Current Topics in Biochemistry Biochemistry of the Cell Advanced Molecular Biology

Page 6: Aliya Sadeque BIOC 599 Supervisory Committee Meeting

Previous Coursework Undergraduate Level Computing:

Database Management Systems Neural and Genetic Computing Introduction to Data Mining System Level Programming Operating Systems

Undergraduate Level Mathematics: Introduction to Statistics Discrete Math for Computer Scientists Modeling Techniques in Biology

Page 7: Aliya Sadeque BIOC 599 Supervisory Committee Meeting

Thesis Project Blueprint Context

What do we know so far Why is this work important

LCS Hits curves

where does the number of hits explode Visualization

Where are these regions Further investigation of regions of interest

Promoter Prediction Tools

Existing tools: what’s out there? Developing a new tool

Visualization visualize all predicted promoters against LCS identified

regions

Page 8: Aliya Sadeque BIOC 599 Supervisory Committee Meeting

Context

Promoter sequences might be identified as conserved islands in a divergent sea”

Page 9: Aliya Sadeque BIOC 599 Supervisory Committee Meeting

Longest Common Subsequence

Page 10: Aliya Sadeque BIOC 599 Supervisory Committee Meeting

Longest Common Subsequence

subsequence length # solutions

51 0

52 0

60 0

50 1

45 6

40 13

36 25

35 28

30 48

25 101

20 191

17 344

15 1350

14 5966

13 23723

12 63845

10 118643

subsequence length # solutions

58 0

60 0

57 2

55 4

54 5

53 6

50 14

45 24

40 46

30 114

25 216

20 667

19 1004

18 2105

17 6554

15 58492

subsequence length

# solutions

62 0

63 0

64 0

65 0

61 2

60 5

57 7

59 7

56 10

55 12

54 16

53 20

52 24

51 27

Page 11: Aliya Sadeque BIOC 599 Supervisory Committee Meeting

Longest Common Subsequence Hits Curves

Figure 1 Hits curve (full range view)

-20000

0

20000

40000

60000

80000

100000

120000

140000

0 10 20 30 40 50 60 70

subsequence length (nts)

nu

mb

er

of

matc

hes

error 1 errors 2 errors 3

Page 12: Aliya Sadeque BIOC 599 Supervisory Committee Meeting

Longest Common Subsequence Hits Curves

Figure 2. Hits curve (under 50 matches)

-5

5

15

25

35

45

0 10 20 30 40 50 60 70

subsequence length (nts)

nu

mb

er

of

matc

hes

error 1 errors 2 errors 3

Page 13: Aliya Sadeque BIOC 599 Supervisory Committee Meeting

Promoter Prediction

Existing Tools Interpolated Context Modeling (ICM) Feedforward Neural Network

New ideas for promoter prediction Neural Networks

Drosophila tool GPCR tool

Data mining techniques WEKA

Other forms of computer learning

Page 14: Aliya Sadeque BIOC 599 Supervisory Committee Meeting

Course Selection

BIOC 570 - completed MICR 502 - Virology Courses to sit in for:

Biochemistry courses? Computing courses?

Data mining Bioinformatics

Page 15: Aliya Sadeque BIOC 599 Supervisory Committee Meeting

Questions for myself

Hits curves – what does it mean if they identify same region? Exact to approximate matches

Why is this study important What weka tools would be good?