30
Positional Association Rules Dr. Bernard Chen Ph.D. University of Central Arkansas

Positional Association Rules

Embed Size (px)

DESCRIPTION

Positional Association Rules. Dr. Bernard Chen Ph.D. University of Central Arkansas. Central Dogma of Molecular Biology. Amino Acids, the subunit of proteins. Protein Primary, Secondary, and Tertiary Structure. Protein 3D Structure. Protein Sequence Motif. - PowerPoint PPT Presentation

Citation preview

Page 1: Positional Association Rules

Positional Association Rules

Dr. Bernard Chen Ph.D.University of Central Arkansas

Page 2: Positional Association Rules

Central Dogma of Molecular Biology

Page 3: Positional Association Rules

Amino Acids, the subunit of proteins

Page 4: Positional Association Rules

Protein Primary, Secondary, and Tertiary Structure

Page 5: Positional Association Rules

Protein 3D Structure

Page 6: Positional Association Rules

Protein Sequence Motif Although there are 20 amino acids, the

construction of protein primary structure is not randomly choose among those amino acids

Sequence Motif: A relatively small number of functionally

or structurally conserved sequence patterns that occurs repeatedly in a group of related proteins.

Page 7: Positional Association Rules

Protein Sequence Motif

These biologically significant regions orresidues are usually: Enzyme catalytic site Prostethic group attachment sites

(heme, pyridoxal-phosphate, biotin…) Amino acid involved in binding a metal

ion Cysteines involved in disulfide bonds Regions involved in binding a molecule

(ATP/ADP, GDP/GTP, Ca, DNA…)

Page 8: Positional Association Rules

HSSP-BLOSUM62 Measure

Page 9: Positional Association Rules

Part1Bioinformatics

Knowledge and Dataset Collection

Part2Discovering Protein

Sequence Motifs

Part3Motif Information

Extraction

Part4Mining the Relations between Motifs and

Motifs

Part5Protein Local Tertiary Structure Prediction

FutureWorks

Page 10: Positional Association Rules

Motivation In order to obtain the DNA/protein

sequence motifs information, fixing the length of sequence segments is usually necessary.

Due to the fixed size, they might deliver a number of similar motifs simply shifted by several bases or including mismatches

Page 11: Positional Association Rules

Example If there exists a biological sequence motif with

length of 12 and we set the window size to 9, it is highly possible that we discovered two similar sequence motifs where one motif covers the front part of the biological sequence motif and the other one covers the rear part.

Page 12: Positional Association Rules

Positional Association Rules The basic association rule gives the information

of A => B

However, under the circumstances of the “order” involved with the appearance of items, the basic association rule is not powerful enough

we introduce another parameter called “distance assurance” to help identify frequent itemset with frequent distance

Page 13: Positional Association Rules

Positional Association Rules

Page 14: Positional Association Rules

Pseudocode of Positional Association Rule with the Apriori concept Algorithm: Positional Association Rule with the Apriori ConceptInput: Database, D, (Protein sequences as Transactions and Sequence Motifs as items), min_support, min_confidence, and min_distance_assuranceOutput: P, positional association rules in D

Method: L = find_frequent_itemsets(D, min_support) S = find_strong_association_rules(L, min_confidence) for (k=2; Sk ≠ Ø; k++ ) for each strong association rule, r Sk antecedent_motif = Apriori_Motif_Construct(r_ant) consequence_motif = Apriori_Motif_Construct(r_con) if antecident_motif == NULL or consequence_motif == NULL: goto Step (4) for each protein sequence, ps D for (ant_position=1; |ps| ; ant_position++) if antecedent_motif start appear on ps[ant_position]: r_ant_count++ for (con_position=1; |ps| ; con_position++) if consequent_motif start appear on ps[con_position]: distance = ant_position – con_position rdistance ++ Pk = { rdistance | rdistance > min_distance_assurance * r_ant_count }

Apriori_Motif_Construct(itemset) if |itemset| == 1: return itemset else: for each positional association rules in P|itemset| if all items in the itemset appear in the positional association rule: return the new motif constructed by the positional association rule return NULL

Page 15: Positional Association Rules

Positional Association Rules Example

Page 16: Positional Association Rules

Positional Association Rules Example

minimum support = 60%, minimum confidence = 80%, minimum distance assurance =

60%

Page 17: Positional Association Rules

minimum support = 60%, minimum confidence = 80%, minimum distance assurance = 60%

Scan for C1 A: 3/5 A B: 5/5 B C: 2/5 => => AB, AD,

BDD: 4/5 DE: 1/5

Page 18: Positional Association Rules

minimum support = 60%, minimum confidence = 80%, minimum distance assurance = 60%

Scan for C2

AB: 3/5 ABAD: 3/5 => AD => ABDBD: 4/5 BD

Page 19: Positional Association Rules

minimum support = 60%, minimum confidence = 80%, minimum distance assurance = 60%

Scan for C3

ABD: 3/5 => ABD => no C4

Page 20: Positional Association Rules

minimum support = 60%, minimum confidence = 80%, minimum distance assurance = 60%

Therefore, the itemset that pass support: {AB, AD, BD, ABD}

Next, we need to compute their confidence

Page 21: Positional Association Rules

minimum support = 60%, minimum confidence = 80%, minimum distance assurance = 60%

First, we work on 2-itemset:{AB,AD,BD}

A=>B: 3/3 B=>A: 3/5A=>D: 3/3 D=>A: 3/4B=>D: 4/5D=>B: 4/4

Page 22: Positional Association Rules

minimum support = 60%, minimum confidence = 80%, minimum distance assurance = 60%

then, we work on 3-itemset:{ABD}

A=>BD: 3/3B=>AD: 3/5D=>AB: 3/4AB=>D: 3/3AD=>B: 3/3BD=>A: 3/4

Page 23: Positional Association Rules

minimum support = 60%, minimum confidence = 80%, minimum distance assurance = 60%

Thus, the strong association rules we have:

2-itemset 3-itemsetA=>B A=>BDA=>D AB=>DB=>D AD=>BD=>B

Next, we work on Positional Association rules…

Page 24: Positional Association Rules

Positional Association Rules D=>B minimum distance assurance = 60%

1. = 3/4 3. =1/4

2. = 1/4

)(3

BD

)(19

BD

)(20

BD

Page 25: Positional Association Rules

Positional Association Rules B=>D minimum distance assurance = 60%

1. = 3/6 3. = 1/6

2. = 1/6

)(3

DB

)(17

DB

)(19

DB

Page 26: Positional Association Rules

Positional Association Rules A=>B minimum distance assurance = 60%

1. = 2/4 3. = 1/4

2. = 1/4 4. = 1/4

)(2

BA

)(22

BA

)(25

BA

)(24

BA

Page 27: Positional Association Rules

Positional Association Rules A=>D minimum distance assurance = 60%

1. = 3/4

2. = 1/4

)(5

DA

)(28

DA

Page 28: Positional Association Rules

Positional Association Rules AD=>B minimum distance assurance = 60%

1. = 2/3

2. = 1/3

))((25

BDA

))((245

BDA

Page 29: Positional Association Rules

Positional Association Rules AB=>D minimum distance assurance = 60%

NO Positional Association Rules on AB !!!

Page 30: Positional Association Rules

Positional Association Rules A=>BD minimum distance assurance = 60%

1. = 2/4

2. = 1/4

))((32

BDA

))((325

BDA