View
227
Download
0
Category
Preview:
Citation preview
8/9/2019 pattern matching 2003lec31
1/20
Pattern Matching
Rhys Price Jones
Anne R. Haake
8/9/2019 pattern matching 2003lec31
2/20
What is pattern matching?
Pattern matching is the procedure of
scanning a nucleic acid or protein sequence
for matches to short sequence patterns
!taden "##$%.
8/9/2019 pattern matching 2003lec31
3/20
Why search for patterns?
&sually the sequences of interest the query
sequences% are kno'n to (e indicators of
some important (iological function
!earch for patterns in nucleotide sequence
) *+A or R+A
!earch for patterns in amino acid sequence
8/9/2019 pattern matching 2003lec31
4/20
Motif
multiples uses of the 'ord
*ef, a pattern- typically is used to refer to a
short up to ten (ases or residues% repeated
or consered pattern in nucleic acids or
proteins
*ef, a short consered sequence in a protein-
usually associated 'ith function) in a (roader sense/ motif is used for all locali0ed
regions of homology/ regardless of si0e
8/9/2019 pattern matching 2003lec31
5/20
!ome e1amples of patterns in *+A
sequence,
Restriction sites,recognition sites for the
restriction endonucleases 2ntron splice sites
3odons specifying 4R5s
Promoters
*+A (inding sites for regulatory proteins
8/9/2019 pattern matching 2003lec31
6/20
Restriction !ites
Why identify them?
61act or ine1act matches?
61amples,
Restriction sites
8/9/2019 pattern matching 2003lec31
7/20
!plice !ites
!plice donor and splice acceptor are consensus sequences
)A statistical determination of the
pattern-appro1imates the pattern
3orA%A7879Aor7%A79 :donor: splice site
9or3%n+3or9%A787 :acceptor: splice site
!plice site e1ample
8/9/2019 pattern matching 2003lec31
8/20
!plice !ites
Remem(er that they are consensus sequences
Why are splice sites of interest?
) 7ene finding
) Mutations in consensus sequence at the splice ;unctionscommon in many inherited disorders
61, thalassemias/ muscular dystrophy/ 9ay
neurofi(romatosis/ *arier=s disease>>..
4ne of the thalassemias, mutation at splice acceptor
YYYNCAG| normal
YYYNCGG| mutant
8/9/2019 pattern matching 2003lec31
9/20
3odons !pecifying 4R5s
4R5s open reading frames%
!tart codon >.$
8/9/2019 pattern matching 2003lec31
10/20
Promoters
Prokaryotic promoters, 3onsensus sequences TTGACA ---- 171 ----TATAAT
-35 -10
6ukaryotic promoters) 9A9A (o1 at )@ relatie to transcriptional start site consensus is =
8/9/2019 pattern matching 2003lec31
11/20
9ranscription 5actor Finding !ites
Regulatory transcription factors are
sequence
8/9/2019 pattern matching 2003lec31
12/20
!ome e1amples of patterns in protein
sequences motifs%,
Prediction of secondary and tertiary
structure
) e.g. transcription factors
heli1
8/9/2019 pattern matching 2003lec31
13/20
61act s 2ne1act Appro1imate% Pattern
Matching
61act Pattern Matching
) Gimited use in (ioinformatics
) Well
8/9/2019 pattern matching 2003lec31
14/20
4ther uses of e1act pattern matching?
3heck P3R primers?
Annotation? te1t matching%
8/9/2019 pattern matching 2003lec31
15/20
Why search for patterns?
Pattern matching in sequences is also the
(asis of searching through a sequence
data(ase
) !equence alignment
8/9/2019 pattern matching 2003lec31
16/20
Pair'ise !equence Alignment
An alignment (et'een @ sequences is a
pair'ise match (et'een sequences.
Pair'ise sequence comparison is the primary
means of linking (iological function to the
genome and of propagating kno'n
information from one genome to another7i(as Jam(eck%
.
8/9/2019 pattern matching 2003lec31
17/20
Why are ine1act pattern matches releant
in sequence alignments?
!equencing errors
Mutation) @ primary types point mutations affect a single nucleotide%
segmental mutations affect a fe' to hundreds of
ad;oining nucleotides%
) su(stitutions transitions/ transersions%
) insertions/ deletions
8/9/2019 pattern matching 2003lec31
18/20
Mutations
Point mutations usually occur from a nucleotidemismatch that (ecomes Ifi1ed during the process ofreplication) 6scapes the *+A repair mechanism
!ignificant 'hen occur 'ithin a coding region andalso cause a change in functionality) +on
8/9/2019 pattern matching 2003lec31
19/20
6olutionary 3onsiderations
9hrough time mutations tend to (e presered
if they are not deleterious
5unctionally important sequences tend to (e
consered
+on
8/9/2019 pattern matching 2003lec31
20/20
6olutionary 3onsiderations
9he tendency of functionally important
sequences to remain relatiely unchanged
oer time is the (asis for sequence analysis
)Allo's us to dra' eolutionary connections among
genes that are related in sequence
Recommended