Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Graduate Theses, Dissertations, and Problem Reports
2011
Suffix Structures and Circular Pattern Problems Suffix Structures and Circular Pattern Problems
Jie Lin West Virginia University
Follow this and additional works at: https://researchrepository.wvu.edu/etd
Recommended Citation Recommended Citation Lin, Jie, "Suffix Structures and Circular Pattern Problems" (2011). Graduate Theses, Dissertations, and Problem Reports. 3402. https://researchrepository.wvu.edu/etd/3402
This Dissertation is protected by copyright and/or related rights. It has been brought to you by the The Research Repository @ WVU with permission from the rights-holder(s). You are free to use this Dissertation in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you must obtain permission from the rights-holder(s) directly, unless additional rights are indicated by a Creative Commons license in the record and/ or on the work itself. This Dissertation has been accepted for inclusion in WVU Graduate Theses, Dissertations, and Problem Reports collection by an authorized administrator of The Research Repository @ WVU. For more information, please contact [email protected].
Suffix Structures and Circular Pattern Problems
Jie Lin
Dissertation submitted to theCollege of Engineering and Mineral Resources
at West Virginia Universityin partial fulfillment of the requirements
for the degree of
Doctor of Philosophyin
Computer and Information Sciences
Dr. Donald Adjeroh, Ph.D., ChairDr. Elaine M Eschen , Ph.D.
Dr. Arun Ross, Ph.D.Dr. James Harner, Ph.D.
Dr. Cun-Quan Zhang, Ph.D
Lane Department of Computer Science and Electrical EngineeringMorgantown, West Virginia, 2011
Keywords: Suffix Array, Suffix Tree, Pattern Matching, Text Mining, Probabilistic SuffixTrees, Probabilistic Suffix Arrays, Markov Models, Space Efficiency, Circular Patterns, Mul-tidomain Proteins, Circular Pattern Discovery
Copyright@ 2011 Jie Lin
ABSTRACT
The suffix tree is a data structure used to represent all the suffixes in a string. However, a majorproblem with the suffix tree is its practical space requirement. In this dissertation, we propose an efficientdata structure – the virtual suffix tree (VST) – which requires less space than other recently proposed datastructures for suffix trees and suffix arrays. On average, the space requirement (including that for suffixarrays and suffix links) is 13.8n bytes for the regular VST, and 12.05n bytes in its compact form, wheren is the length of the sequence.
Markov models are very popular for modeling complex sequences. In this dissertation, we presentthe probabilistic suffix array (PSA), a space-efficient alternative to the probabilistic suffix tree (PST) usedto represent Markov models. The PSA provides all the capabilities of the PST, such as learning and pre-diction, and maintains the same linear time construction (linearity with respect to sequence length). ThePSA, however, has a significantly smaller memory requirement than the PST, for both the constructionstage, and at the time of usage.
Using the proposed suffix data structures, we study the circular pattern matching (CPM) problem.
We provide a linear time, linear space algorithm to solve the exact circular pattern matching problem. We
then present four algorithms to address the approximate circular pattern matching (ACPM) problem. Our
bidirectional ACPM algorithm provides the best time complexity when compared with other algorithms
proposed in the literature. Further, we define the circular pattern discovery (CPD) problem and present
algorithms to solve this problem. Using the proposed circular pattern matching algorithms, we perform
experiments on computational analysis and function prediction for multidomain proteins.
Acknowledgement
I would like to thank my advisor, Dr. Don Adjeroh, for his guidance, advice, and contin-ued encouragement. It has been a pleasure to work under his supervision. Without him, thisdissertation could not have come about.
I would also like to thank my other committee members: Dr. Elaine Eschen, Dr. ArunRoss, Dr. James Harner, and Dr. Cun-Quan Zhang for their help during my studies.
And finally, I thank my family members for their constant support, encouragement, andhelp.
The work reported in this thesis was partly supported by a DOE CAREER award (No:DE-FG02-02ER25541 ), an NSF ITR award (No: 0312484), and a WV-EPSCoR RCG grant.
iii
Contents
1 Introduction 1
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Suffix Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.2 Markov Models and Probabilistic Suffix Tree . . . . . . . . . . . . . . 3
1.2.3 Circular Pattern Matching and Circular Pattern Discovery . . . . . . . 5
1.3 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Related Work 9
2.1 Suffix Tree and Suffix Array . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.1 Basic Notations and Definitions . . . . . . . . . . . . . . . . . . . . . 10
2.1.2 Suffix Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.3 Suffix Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
iv
2.1.4 Implementation and Problems with the Suffix Tree . . . . . . . . . . . 12
2.1.5 Suffix Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Space-Efficient Suffix Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.1 ESA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.2 LST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Markov Models and Probabilistic Suffix Tree . . . . . . . . . . . . . . . . . . 15
2.3.1 Variable Length Markov Models . . . . . . . . . . . . . . . . . . . . . 15
2.3.2 Probabilistic Suffix Tree . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.3 Computing T F and DF via Suffix Arrays . . . . . . . . . . . . . . . . 16
2.4 Circular Pattern Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.1 String Pattern Matching . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.2 Circular Pattern Matching Problems . . . . . . . . . . . . . . . . . . . 23
2.4.3 Exact Circular Pattern Matching (ECPM) . . . . . . . . . . . . . . . . 25
2.4.4 Approximate Circular Pattern Matching (ACPM) . . . . . . . . . . . . 26
2.4.5 ACPM Problem in Protein Sequences . . . . . . . . . . . . . . . . . . 29
2.5 Pattern Discovery Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3 The Virtual Suffix Tree 31
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 Basic Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2.1 Example VST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
v
3.2.2 Properties of the Virtual Suffix Tree . . . . . . . . . . . . . . . . . . . 34
3.2.3 Pattern Matching on VST . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3 Improved Virtual Suffix Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3.1 Adjusting Edge Lengths . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3.2 Construction Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3.3 Further Space Reduction . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.3.4 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.4 Computing Suffix Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.5 From SA to VST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4 The Probabilistic Suffix Array 59
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2 Probabilistic Suffix Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3 Proposed Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.3.1 Internal Node Attributes . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.3.2 Measurement Attributes . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.3.3 Example PSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.3.4 Interval Array and Document Frequency in Linear Time . . . . . . . . 65
4.4 Constructing the PSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.4.1 Building the Interval Tree . . . . . . . . . . . . . . . . . . . . . . . . 67
vi
4.4.2 Building the Suffix Link . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.4.3 Sorting the PSA Structure . . . . . . . . . . . . . . . . . . . . . . . . 68
4.4.4 Computing Conditional Probabilities Using the PSA . . . . . . . . . . 70
4.4.5 Prediction with VLMM via the PSA . . . . . . . . . . . . . . . . . . . 71
4.5 Space Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.5.1 Storage Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.5.2 Construction Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.6.1 Predicting Protein Families . . . . . . . . . . . . . . . . . . . . . . . . 77
4.6.2 Space Consideration . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.6.3 Computational Time Requirement . . . . . . . . . . . . . . . . . . . . 80
4.6.4 PSA in Phylogenetic Tree Construction . . . . . . . . . . . . . . . . . 81
4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5 Circular Pattern Matching 87
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.2 Exact Circular Pattern Matching Problem . . . . . . . . . . . . . . . . . . . . 89
5.2.1 Linear Time ECPM Algorithm . . . . . . . . . . . . . . . . . . . . . . 89
5.2.2 Comparison of ECPM algorithms . . . . . . . . . . . . . . . . . . . . 92
5.3 Approximate Circular Pattern Matching Problem . . . . . . . . . . . . . . . . 93
5.3.1 Greedy ACPM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 93
vii
5.3.2 ACPM with LIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.3.3 ACPM with q-grams and Suffix Array . . . . . . . . . . . . . . . . . . 96
5.3.4 Improved Algorithm: ACPM with Bidirectional Edit Distance . . . . . 98
5.3.5 Comparison with Other ACPM Algorithms . . . . . . . . . . . . . . . 104
5.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.4.1 Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.4.2 CPM Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . 107
5.4.3 Multidomain Protein Networks using Circular Patterns . . . . . . . . . 110
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6 Circular Pattern Discovery 133
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.2 The Circular Pattern Discovery Problem . . . . . . . . . . . . . . . . . . . . . 134
6.3 The ECPD Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
6.4 The ACPD Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
6.4.1 ACPD using Maes’ Algorithm . . . . . . . . . . . . . . . . . . . . . . 137
6.4.2 Proposed ACPD Algorithm . . . . . . . . . . . . . . . . . . . . . . . 139
6.4.3 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
6.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
viii
7 Conclusion and Future Work 147
7.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
7.2.1 Circular Pattern Discovery . . . . . . . . . . . . . . . . . . . . . . . . 149
7.2.2 Network Analysis for Circular Multidomain Proteins . . . . . . . . . . 149
7.2.3 From PSA to PFA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
7.2.4 Approximate Pattern Matching Using PSA . . . . . . . . . . . . . . . 150
7.2.5 Prediction with PSA using Inexact Matching . . . . . . . . . . . . . . 150
7.3 Publications from the Dissertation . . . . . . . . . . . . . . . . . . . . . . . . 152
ix
List of Tables
2.1 Interval Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1 VST node attributes for the example sequence T = missississippi$ usedin Figure 3.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2 Node attributes in the improved VST for the example sequence, T =missississippi$. 40
3.3 Branching factor and maximum space requirement for various sample files. . . 54
3.4 Storage requirement for the VST, including suffix links . . . . . . . . . . . . . 55
3.5 Detailed attributes for nodes in the TA data structure using the sample sequence,T = missississippi$. . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.6 Node mapping table from TA to VST . . . . . . . . . . . . . . . . . . . . . . 56
4.1 Attributes of PSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2 Example PSA internal nodes, using the PSA of the sequence T = accactact$ 65
4.3 Example PSA leaf nodes, using the PSA of the sequence T = accactact$ . 65
x
4.4 Performance of the PSA in modeling and prediction of protein families. Fam-ilies correspond to the first 51 protein families with 12 or more members inthe Pfam database, ordered alphabetically based on their abbreviated names inPfam. For comparison, we have included the results obtained using the PST [17]on the same data set. (TP stands for true positive, while MD stands for misseddetection). ∗∗The family apple was not in the dataset used in [17]. . . . . . . 84
4.5 Summary performance in protein family classification using the PSA and PST . 85
4.6 Summary data on the first 51 families in Pfam, as described in Table 4.4. . . . . 85
4.7 Construction memory needed for the PSA and PST. Results are based on thefirst 51 families in Pfam, as described in Table 4.4. . . . . . . . . . . . . . . . 85
4.8 Construction time comparison for PSA and PST. Results are based on the first51 families in Pfam, as described in Table 4.4. Recorded time is time neededper family (in seconds). Speedup is computed as the ratio with respect to PSAtime. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.9 Prediction time comparison between PSA and PST. Results are based on thefirst 51 families in Pfam, as described in Table 4.4. Recorded time (in seconds)is prediction time per family – i.e. total time needed to predict all members inthe family against all the other families. . . . . . . . . . . . . . . . . . . . . . 86
5.1 Comparison of ECPM algorithms . . . . . . . . . . . . . . . . . . . . . . . . 93
5.2 Comparison with other proposed ACPM Algorithms . . . . . . . . . . . . . . 105
5.3 Top 15 highest degree proteins with GO function . . . . . . . . . . . . . . . . 109
5.4 The predicted protein functions using union for In-edge and Out-edge . . . . . 111
5.5 The predicted protein functions using intersection for In-edge and Out-edge . . 112
5.6 Performance in Protein Function Prediction using the Top-500 Proteins . . . . 112
xi
5.7 Network statistics for multidomain protein networks . . . . . . . . . . . . . . 114
5.8 Top 25 proteins with the highest node degree differences between protein net-works using the circular and non-circular patterns. . . . . . . . . . . . . . . . . 116
5.9 The longest path in the Protein network . . . . . . . . . . . . . . . . . . . . . 117
5.10 The longest path in the Family network . . . . . . . . . . . . . . . . . . . . . 118
6.1 The number of distinct patterns with pattern length . . . . . . . . . . . . . . . 144
6.2 Sample discovered circular patterns with length five. . . . . . . . . . . . . . . 144
6.3 Sample discovered circular patterns with length thirteen. . . . . . . . . . . . . 145
xii
List of Figures
1.1 Summary of work in this dissertation. . . . . . . . . . . . . . . . . . . . . . . 8
2.1 Algorithm for computing document frequency [131] . . . . . . . . . . . . . . 20
2.2 Edit Graph of T and PP [87]. . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3 Maes’ Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.1 Suffix tree and virtual suffix tree for the string T = missississippi$. . . . 51
3.2 Example VST (solid nodes) showing left SA index (lSA) and right SA index(rSA) for sample nodes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3 Edge-length adjustment procedure. . . . . . . . . . . . . . . . . . . . . . . . 52
3.4 Improved VST for the string T = missississippi$ . . . . . . . . . . . . 52
3.5 Suffix links on the VST for the sample string T = missississippi$. . . . 55
3.6 Constructing VST from the suffix array. . . . . . . . . . . . . . . . . . . . . . 56
4.1 State diagram and transition matrix for a first order Markov model for an exam-ple sequence T = accactact$. . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2 Example suffix tree and probabilistic suffix tree for the string T = accactact$. 62
xiii
4.3 Top-k classification rate for sample protein families using the PSA. . . . . . . . 79
4.4 Memory consumption factor (MC Factor) needed to construct the PSA and PSTdata structures for the first 51 protein families in Pfam. . . . . . . . . . . . . . 80
4.5 Phylogenetic tree for 20 species constructed using the predicted probabilitiesobtained using the PSA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.1 Suffix tree for the string T = missississippi$ with some suffix links. . . 118
5.2 The number of hypotheses with q-gram . . . . . . . . . . . . . . . . . . . . . 121
5.3 Dynamic Programing in q-gram matching . . . . . . . . . . . . . . . . . . . . 122
5.4 Three cases in computing the circular edit distance in the ACPM algorithmusing the bidirectional edit distance. The numbered double-header show thesymbol positions involved in each case. . . . . . . . . . . . . . . . . . . . . . 123
5.5 The time cost of the CPM algorithms . . . . . . . . . . . . . . . . . . . . . . . 124
5.6 Degree distributions in the network of multidomain proteins constructed basedon the circular patterns they contain. . . . . . . . . . . . . . . . . . . . . . . . 125
5.7 Number of directly connected pairs in Top-K highest degree proteins . . . . . . 126
5.8 The Protein network (using both CPs and non-CPs) . . . . . . . . . . . . . . . 127
5.9 The Protein network using only non-circular patterns . . . . . . . . . . . . . . 128
5.10 The Protein network using only circular patterns . . . . . . . . . . . . . . . . . 129
5.11 The Family network (using both CPs and non-CPs) . . . . . . . . . . . . . . . 130
5.12 The Family network using only non-circular patterns . . . . . . . . . . . . . . 131
5.13 The Family network using only circular patterns . . . . . . . . . . . . . . . . . 132
xiv
6.1 Variation of the number of distinct patterns (including non-circular and circularpatterns) with pattern length. . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
6.2 Variation of number of circular patterns with pattern length. . . . . . . . . . . . 143
6.3 Variation of maximum number of occurrences with pattern length. . . . . . . . 145
xv
Chapter 1
Introduction
1.1 Overview
The suffix tree is an important data structure used to represent sequences (for example, text,DNA sequence, video, etc.). However, its space requirement is huge for most practical applica-tions. Most methods for suffix tree (ST) and suffix array (SA) have focused on the theoreticaltime and space complexity. Markov models are popular for modeling complex sequences whosesources are unknown, or whose underlying statistical characteristics are not well understood. Amajor problem, however, is its space complexity, which grows exponentially with the order ofthe Markov model. The probabilistic suffix tree (PST) was proposed to address with the spaceproblem of the Markov model. This reduced the space complexity theoretically. However, inpractice, the space requirement for the PST is still relatively large, and often impractical formost real-life problems. Circular permutations and circular pattern matching are interestingproblems in computer science and biology. There are several algorithms and methods to solvethis problem. But in practice, these algorithms and methods have huge time and space costs.
In the first part of this dissertation, we develop efficient suffix data structures for analysisof huge sequences. In the second part, we use these data structures to study the problem ofcircular permutations and their applications in computational biology. Based on our circularpermutation work, we define and study the circular pattern discovery problem.
1
CHAPTER 1. INTRODUCTION 2
First we propose a space efficient data structure called the virtual suffix tree (VST). The VSTsupports the same functions as the suffix tree, but with much less space practical requirement.The average space requirement is significantly smaller than other data structures for suffix treeand suffix array. Secondly, we propose an efficient data structure, the probabilistic suffix array(PSA) to represent the Markov model. PSA takes a much smaller space than probabilistic suffixtree which is implemented on a regular suffix tree. We show the experiment in the biologyapplications. Lastly, using suffix trees and suffix arrays we propose algorithms to solve thecircular pattern matching problem and use these to study circular permutations and functionprediction for multidomain proteins. We also introduce the circular pattern discovery problemand present algorithms to solve the problem.
1.2 The Problem
1.2.1 Suffix Tree
The suffix tree is an important data structure used to represent the set of all suffixes ofa string. The suffix tree is efficient in both time and space, and has been used in a varietyof applications, such as pattern matching, sequence alignment, identification of repetitions ingenome-scale biological sequences, and in data compression. Various algorithms have been de-veloped for efficient construction of suffix trees [38,89,122,128]. However, one major problemwith the suffix tree is its practical space requirement. The suffix array is a related data structure,which was originally introduced in [84] as a space-efficient alternative to the suffix tree. Thesuffix array simply provides a listing of all the suffixes of a given string in lexicographic order.The suffix array can be used in most (though, not all) situations where a suffix tree is used.
Although the theoretical space complexity is linear for both data structures, typically, for agiven string T of length n, the suffix array requires about three to five times less space than thesuffix tree. The construction time for both algorithms is also O(n) on average. For suffix arrays,construction algorithms that run in O(n logn) worst case are relatively easy to develop, butO(n) worst case algorithms are much harder to come by. Recent suffix sorting algorithms withworst-case linear time complexity have been reported in [57, 63, 65]. Gusfield [46] providesa comprehensive treatment of suffix trees and its applications. Puglisi et al. [104] provide a
CHAPTER 1. INTRODUCTION 3
recent survey on suffix arrays. An extensive discussion on the connection between the Burrows-Wheeler Transform [23] and suffix trees and suffix arrays is provided in [3].
For small alphabet sizes, the suffix tree and the suffix array have about the same complexityin pattern matching. For pattern matching, the suffix array requires time in O(m logn) to locateone occurrence of a pattern P of length m in T . However, with additional data structures, such asthe lcp array, this time can be reduced to O(m+ logn). With the suffix tree, the same search canbe performed in O(m) time. The problem, however, is for sequences with large alphabets where|Σ| → n. Here, |Σ|, the alphabet size is no longer negligible. Using the array representation ofnodes in the suffix tree will require O(n|Σ|) space for the suffix tree, and O(m) time for patternmatching. For linear space, the linked list or binary search tree can be used, but the search timebecomes O(m|Σ|) or O(m log |Σ|) respectively.
The Challenge. A key problem then is to develop space-efficient data structures that cansupport pattern matching using the same time complexity as suffix trees, but at a practical spacerequirement that approaches that of the suffix array. Such a data structure should also supportthe complete functionality of the suffix tree, such as support for suffix links, as may be requiredin certain applications. Two recent data structures that have attempted to address this problemare the ESA – enhanced suffix array [2], and the LST – linearized suffix tree [61, 62]. Bothmethods are based on the notion of lcp-intervals [60], constructed using the suffix array andthe lcp array.
1.2.2 Markov Models and Probabilistic Suffix Tree
Markov models are very popular for modeling complex sequences whose sources are un-known, or whose underlying statistical characteristics are not well understood. This is especiallythe case when the sequences exhibit some memory. For a short term memory of length, say L,this means that the conditional distribution of the next symbol given the last L symbols doesnot change significantly if we condition on L or more previous symbols. Thus, such sequencesare often modeled using Markov models of order L , or using the Hidden Markov Models(HMM) [37]. The models provide efficient mechanisms to compute the required conditionalprobabilities, and also for generating sequences from the models. The problem is that the sizeof Markov models increases exponentially with increasing memory length L. Thus, they are
CHAPTER 1. INTRODUCTION 4
practical only for low order models with short memory lengths. This leads to the second chal-lenge: such low-order Markov models often provide a poor approximation of the true sequencebeing modeled. It is known that learning with Hidden Markov Models is computationally verychallenging. Hardness results on the learnability of HMM are discussed in [1], while similarresults on inferencing using HMM are reported in [42].
Probabilistic suffix models such as probabilistic suffix trees (PST) have been proposed byRon et al. [108] to address some of the key problems with Markov models. They showedthe equivalence between PSTs and a subclass of probabilistic finite automata (PFAs) calledprobabilistic suffix automata (PSF/PSA): for a given PST, there is an algorithm to constructa PSF/PSA whose size is the same as that of the PST within a constant factor. Further, thedistribution generated by a PST is guaranteed to be within a small distance from that generatedusing the PSF/PSA, as measured by the Kullback-Leibler divergence.
Probabilistic suffix trees and probabilistic suffix automatons are related to context-basedmodels which are extensively used in sequence prediction and data compression. Typical exam-ples of such context models include context tree weighting (CTW) [129], prediction by partialmatching (PPM) [28], and the Lempel-Ziv decomposition [133, 134]. See also [106, 107]. Theuse of these context models as a surrogate for variable length Markov models with applicationsin sequence prediction are reviewed in [16]. Other applications have been found in model-ing DNA sequences [109], protein sequence classification [17, 71], and in modeling API callsequences for malicious codes in Windows XP [88].
The Challenge. The probabilistic suffix models however require O(Ln2) time and spaceto construct. The algorithm to construct the PFA/PSA from the PST also runs in O(Ln2) time.Later Apostolico and Bejerano [9] showed how the PST can be constructed in O(n) time, inde-pendent of the order L, using traditional suffix links used in constructing suffix trees, and thenotion of reverse suffix links. They did not consider the problem of constructing the PSF/PSAfrom the PST. Their use of the suffix tree also implies that the space requirements for construct-ing the PST will be very high, although it is still linear in terms of the sequence length.
CHAPTER 1. INTRODUCTION 5
1.2.3 Circular Pattern Matching and Circular Pattern Discovery
Computing similarity (or dissimilarity) between two strings is an important problem ingeneral sequence analysis [44, 44, 46, 51, 81, 115], pattern recognition [113, 120] and biology[40, 47, 55]. The circular edit distance is an extension of traditional edit distance which seeksto determine similarity between strings in a circular shift. A circular shift is a mapping f :Σ∗ → Σ∗, f t(c1...cr) = ct+1...crc1...ct−1ct , where 0 ≤ t ≤ r− 1 and r is the length of stringc1c2...cr. The circular edit distance between two strings s1 and s2 is defined as EDc(s1,s2) =min{ED[ f i(s1), f j(s2)]|0 ≤ i ≤ |s1|− 1, 0 ≤ j ≤ |s2|− 1}, where |s1| is the length of string s1
and |s2| is the length of string s2, ED[A,B] is the standard edit distance between A and B. Thus,the dissimilarity between two strings in a circular shift is a function of the circular distancebetween them. Computational methods have also been proposed to study circular patterns inbiological problems. [123, 124, 126, 127]
In biology, circular proteins and circular permutations in proteins are becoming of increas-ing interest, especially given their role in the structure, function, folding, and stability of pro-teins [40, 47, 55]. In a circular (or cyclic) protein, the traditional N- and C-termini are joined,resulting in a protein sequence with no termini [123]. The cyclotides is a typical example ofa naturally-occurring family of cyclic proteins in the Plant Kingdom. Cyclotides are known toplay a major role and provide important functions in terms of plant defense against insects andother pathogens [35]. Their cyclic structure is known to be an important factor in their unusualstability [35]. Other common examples of cyclic proteins are the bacteriocins, small antimicro-bial peptides with 30-70 residues produced by bacteria [33], cyclosporins found in fungi [66],and the primate rhesus θ-defensin -1 [119] with antibacterial properties for the immune systemof macaques monkeys.
The Challenge. There are several algorithms for calculating the circular distance betweentext T and circular pattern P, but they did not consider the problem of finding circular patternP and its circular shifts inside substrings of text T . In our work, we define variants of the CPMproblem and propose algorithms to solve them.
Pattern discovery is a fundamental analysis method used to identify possibly hidden rela-tions within or between the sequences. Pattern discovery problems are well studied in many
CHAPTER 1. INTRODUCTION 6
applications. In biology, motif discovery is often performed as a kind of pattern discovery ap-plication. To our knowledge, there is no existing work that explicitly studied the problem ofpattern discovery involving circular patterns. In this work, we introduce the Circular PatternDiscovery problem and propose algorithms for its solution.
1.3 Contribution
We introduce the VST (virtual suffix tree), an efficient data structure for suffix trees andsuffix arrays. Starting from the suffix array, we construct the suffix tree, from which we derivethe virtual suffix tree. Later, we remove the intermediate step of suffix tree construction, andbuild the VST directly from the suffix array. The VST provides the same functionality as thesuffix tree, including suffix links, but at a much smaller space requirement. It has the samelinear time construction even for large alphabets, Σ, requires O(n) space to store (n is the stringlength), and allows searching for a pattern of length m to be performed in O(m log |Σ|) time, thesame time needed for a suffix tree. Given the VST, we show an algorithm that computes all thesuffix links in linear time, independent of Σ. The VST requires less space than other recentlyproposed data structures for suffix trees and suffix arrays, such as the enhanced suffix array [2],and the linearized suffix tree [62]. On average, the space requirement (including that for suffixarrays and suffix links) is 13.8n bytes for the regular VST, and 12.05n bytes in its compact form.
We present the probabilistic suffix array (PSA), a data structure for representing informa-tion in variable length Markov chains. The PSA essentially encodes information in a Markovmodel by providing a space-efficient representation of the probabilistic suffix tree (PST). OurPSA provides the same functionality as the PST, but at a significantly reduced space require-ment. Given a sequence of length n, construction and learning in the PSA is done in O(n)time and space, independent of the Markov order. Prediction using the PSA is performed inO(m log n
|Σ|) time, where m is the pattern length, and Σ is the symbol alphabet. The specificmemory requirement is 33n bytes in the worst case, and 26n bytes on average, including spacefor the suffix array and the input sequence. This can be compared with the 41n bytes neededusing the PST.
We propose an exact circular pattern matching (ECPM) algorithm that runs in linear time
CHAPTER 1. INTRODUCTION 7
and linear space. We also propose algorithms to solve the approximate circular pattern matching(ACPM) problem. In our work, we solved a harder version of the ACPM problem when com-pared to other previous work on ACPM [44, 81, 123, 124, 126, 127]. We present an experimenton finding circular relations in multi-domain proteins. Our experiments show that the methodsbased on circular permutations can produce very good results in predicting protein functions.
We propose two algorithms for the Circular Pattern Discovery (CPD) problem. The firstalgorithm uses suffix trees and suffix links to solve the exact circular pattern discovery problemin O(m2
2N) time. The second algorithm uses suffix arrays to solve the more challenging ap-proximate circular pattern discovery (ACPD) problem in O(km2
2N2) worst case, and O(km22N)
on average. By exploiting the nature of the ACPD problem, the complexity can be reduced toO(m2
2N2) worst case, and O(m22N) on average.
Aspects of the work from this dissertation are reported in the following papers: [74–79].
1.4 Organization
The dissertation is organized as follows. In Chapter 2, we introduce related work, includ-ing basic notations and definitions. Chapter 3 presents the Virtual Suffix Tree (VST) includingits construction algorithm, searching algorithm and support for suffix links. We also analyze itstime and space complexity and compare with related algorithms. Chapter 4 introduces the Prob-abilistic Suffix Array (PSA) including detailed explanation and implementation. The practicalspace requirement of the data structure is also examined. In this chapter, we present the experi-ments of PSA using protein sequences and miDNA sequences and protein sequences. Chapter5 discusses the circular pattern matching (CPM) problem. Algorithms are then proposed forthe exact and inexact variants of the problem using suffix data structures. We implement thealgorithms and show examples of using these algorithms to search for circular patterns in mul-tidomain proteins. In Chapter 6, we define the circular pattern discovery (CPD) problem andpresent algorithms to solve the CPD problem. Chapter 7 draws some conclusions and also de-scribes possible directions for future work. Figure 1.1 provides a summary of the work reportedin this dissertation.
CHAPTER 1. INTRODUCTION 8
Dissertation work
PSA
Data Structure
Algorithm
Experiment
Protein Family Classification
Phylogenetic Tree Construction
VST
Data Structure
Algorithm
CPM & CPD
CPM
AlgorithmECPM
ACPM
Experiment
Protein function prediction
Circular pattern network
CPD
AlgorithmECPD
ACPD
Experiment Pattern discovery in multi-domain protein database
Figure 1.1. Summary of work in this dissertation.
Chapter 2
Related Work
In this chapter, first we introduce the suffix data structures, namely suffix trees and suffix arrays,which form the basis for the majority of the work proposed in this dissertation. The suffix datastructures are important in computer science and bioinformatics. In Section 2, we introduce theenhanced suffix array (ESA), the Linearized Suffix Tree (LST), and other space-efficient suffixtrees. These two structures use the suffix array and LCP array to simulate the suffix tree. Theyare related to our new structure, the virtual suffix tree (VST).
In Section 3 we describe variable length Markov models (VLMM) and the probabilisticsuffix tree (PST) which is used to implement VLMM. They are related to a new data structure,the probabilistic suffix array (PSA) proposed in this work. We also present previous workon calculating the term frequency (TF) and document frequency (DF), which are used in ourproposed PSA representation of VLMM.
In Section 4, we discuss the circular pattern matching (CPM) problem. We first review thegeneral string pattern matching problem. We define two CPM problems and introduce previouswork. We will provide our solutions to the different CPM problems and experimental results inChapter 5. In Section 5, we discuss the related work on the pattern discovery problem.
9
CHAPTER 2. RELATED WORK 10
2.1 Suffix Tree and Suffix Array
2.1.1 Basic Notations and Definitions
Let T = T [1..n] be the input string of length n, over an alphabet Σ. Let T = αβγ, for somestrings α, β, and γ (α and γ could be empty). The string β is called a substring of T , α is calleda prefix of T , while γ is called a suffix of T . The prefix α is called a proper prefix of T if α 6= T .Similarly, the suffix γ is called a proper suffix of T if γ 6= T . We will also use ti = T [i] to denotethe i-th symbol in T — both notations are used interchangeably. We use Ti = T [i..n] = titi+1 . . . tnto denote the i-th suffix of T . For simplicity in constructing suffix trees, we ensure that no suffixof the string is a proper prefix of another suffix by appending a special symbol, $ to T , such that$ /∈ Σ, and $ < σ, ∀σ ∈ Σ. We let P = P[1...m] to be the pattern string that needs to be found inT .
In our work, the size of alphabet |Σ| is not fixed. It may be small, example |Σ|=4 for DNAsequences, or it may be large, example |Σ| ≈ 106 for multidomain protein sequences.
2.1.2 Suffix Tree
Given a string T , its suffix tree (ST) is a rooted tree with n leaves, where the i-th leaf nodecorresponds to the i-th suffix Ti of T . Except for the root node and the leaf nodes, every nodemust have at least two descendant child nodes. Each edge in the suffix tree represents a substringof T , and no two edges out of a node start with the same character. For a given edge, the edgelabel is simply the substring in T corresponding to the edge. We use li to denote the i-th leafnode. Then, li corresponds to Ti, the i-th suffix of T . When the edges from each node are sortedalphabetically, then li will correspond to TSA[i], the i-th suffix of T in lexicographic order, whereSA denotes the suffix array.
For edge (u,v) between nodes u and v in ST, the edge label (denoted label(u,v) ) is a non-empty substring of T . The edge length is simply the length of the edge label. The edge label isusually represented compactly using two pointers to the beginning and end of its correspondingsubstring in T . For a given node u in the suffix tree, its path label, L(u) is defined as the label of
CHAPTER 2. RELATED WORK 11
the path from the root node to u. Since each edge represents a substring in T , L(u) is essentiallythe string formed by the concatenation of the labels of the edges traversed in going from theroot node to the given node, u. The string depth of node u, (also called its string length or pathlength) is simply |L(u)|, the number of characters in L(u). The node depth (also called nodelevel) of node u is the number of nodes encountered in following the path from the root to u.The root is assumed to be a node at depth 0.
Given the string T = T [1..n], of length n, but with the end of string symbol appended togive a sequence T$ with length n + 1, the suffix tree of the resulting string T $ will have thefollowing properties:
1. Exactly n+1 leaf nodes.
2. At most n internal (or branching) nodes (the root node is considered an internal node).
3. Every distinct substring of T is encoded exactly once in the suffix tree. Each distinctsubstring is spelled out exactly once by traveling from the root node to some node u, suchthat L(u) is the required substring. Note that the node u may be an implicit node, i.e.ending at a position between two (explicit) nodes.
4. No two edges out of a given node in the suffix tree start with the same symbol.
5. Every internal node has at least two outgoing edges. Properties (1), (2), (4), and (5) implythat a suffix tree will have at most 2n+1 total nodes, and at most 2n edges.
2.1.3 Suffix Links
Some suffix tree construction algorithms make use of suffix links. The notion of suffix linksis based on a well-known fact about suffix trees [89, 128], namely, if there is an internal node uin ST such that its path label L(u) = aα for some single character a ∈ Σ, and a (possibly empty)string α∈ Σ∗, then there is a node v in ST such that L(v) = α. A pointer from node u to node v iscalled a suffix link. If α is an empty string, then the pointer goes from u to the root node. Suffixlinks are important in certain applications, such as in computing matching statistics needed inapproximate pattern matching, regular expression matching, or in certain types of traversal ofthe suffix tree.
CHAPTER 2. RELATED WORK 12
2.1.4 Implementation and Problems with the Suffix Tree
A predominant factor in the space cost for suffix trees is the number of interior nodes inthe tree, which depends on the tree topology. Thus, a major consideration is how the outgoingedges from a node in the suffix tree are represented. The three major representations usedfor outgoing edges are arrays, linked lists, and binary search trees. While the array is simpleto implement, it could require a large memory for large alphabets. However, independent ofthe specific method adopted, a simple implementation of the suffix tree, for example, usingUkknoen’s algorithm [122], can require as large as 33n bytes of storage with suffix links, or 25nbytes without suffix links [3].
2.1.5 Suffix Array
The suffix array (SA) is another data structure, closely related to the suffix tree. The suffixarray simply provides a lexicographically ordered list of all the suffixes of a string. If SA[i] =j, it means that the i-th smallest suffix of T is Tj, the suffix starting at position j in T . Arelated structure, the LCP array contains the length of the longest common prefixes betweenadjacent positions in the suffix array. Combining the suffix array with the LCP informationprovides a powerful data structure for pattern matching. With this combination, decisions onthe occurrence (or otherwise) of a pattern P of length m in the string T of length n can be madein O(m+ logn) time. Given the new worst-case linear-time direct SA construction algorithms,and the small memory footprint of suffix arrays, it is becoming more attractive to constructthe suffix tree from the suffix array. A linear-time algorithm for constructing ST from SA ispresented in [3].
2.2 Space-Efficient Suffix Trees
The problem of practical space needed in using suffix trees have been recognized, andmethods have been proposed to provide space-efficient data structures [8,45,82,83,94,110,111].Andersson et al. [8] presented a level-compressed suffix tree in O(n) bytes. Munro et al. [94]proposed some space efficient suffix structures in O(n logn)bits of space and O(m|Σ|) searching
CHAPTER 2. RELATED WORK 13
time. The structures include a suffix array and other auxiliary structures to represent the suffixtree. But these structures did not include the suffix link which is an important part of the suffixtree. Grossi et al. [45] proposed compressed suffix structures, an indexing structure in O(n)bits with O(m|Σ|) searching time. Compact Suffix Array [82] uses at most 9n bytes of space(or 131
8n for including the LCP array), but the time of construction is O(n logn) and time ofsearching is O(m log logn +(logn)2 log logn + nocc logn log logn), where nocc is the number ofoccurrences.
While the suffix array reduces the problem of space, the suffix tree still provides a simplerway in certain analysis problems, such as in computing matching statistics [27]. Thus, methodshave been proposed to improve the suffix array with extra information to provide the full func-tionality of the suffix tree. The enhanced suffix array [2] and the linearized suffix tree [61, 62]are two example data structures recently proposed for full-text indexing with the functionalitiesof both the suffix tree and suffix array.
2.2.1 ESA
The enhanced suffix array (ESA) [2] is composed of the suffix array, and extra data struc-tures, namely, the lcp array and a child table that contains branching information betweenparent and child nodes in the suffix tree. The key idea used in the ESA is the concept oflcp-intervals (originally used in [60]). Given the suffix array, SA an interval [i.. j] in SA,1≤ i < j ≤ n+1 is called an lcp-interval with lcp-value l if the following conditions hold:
1. lcp[i] < l;
2. lcp[k]≥ l ∀ k s.t. i+1≤ k ≤ j;
3. lcp[k] = l, for some k, s.t. i+1≤ k ≤ j;
4. lcp[ j +1] < l.
Thus, rather than the traditional suffix tree, the ESA constructs an lcp-interval tree. Nodesin a suffix tree are now replaced with lcp-intervals, such that the parent-child relationships
CHAPTER 2. RELATED WORK 14
in a traditional suffix tree are now captured by equivalent parent-child relationships betweenlcp-intervals. The root node corresponds to the interval [1..n] in the suffix array, essentially theentire suffix array. In the ESA, the basic structure used to represent the child-nodes from a givenparent node is a linked list. Essentially, the child table is composed of three arrays, namely up,down, and nextIndex. The up and down arrays store information about the edges in thetree, while array nextIndex records information about the linked list used to represent thesibling relationship between nodes with the same parent. These three arrays would ordinarilyrequire 3n elements to store. Interestingly, only n of these elements are required, and hence thechild table requires only n integers to store. The ESA assumes that |Σ| is small relative to n.Thus, for large alphabets, pattern matching could take longer on the ESA. For instance, with thebinary tree representation of the nodes in the suffix tree, pattern matching will take O(m log |Σ|)time for a pattern of length m; doing the same on the enhanced suffix array (which uses a linkedlist representation of the nodes) will require O(m|Σ|) time, a significant difference for largealphabets.
2.2.2 LST
The linearized suffix tree (LST) [61, 62] is an improvement on the ESA. It uses the sameup and down arrays as the ESA, but replaces the nextIndex array with two other arrays:lchild and rchild. Thus, the two arrays store information about siblings at a given node inthe interval tree, such that the intervals can be represented by a complete binary tree. Specifi-cally, let the interval [i.. j] denote the lcp-interval for a given node in the lcp-interval tree (equiva-lently a node in the traditional suffix tree). Then, the two new arrays are defined as follows [61]:lchild[i] records the first index of the left child node of the longest interval starting at i in thecomplete binary tree; rchild[i] records the corresponding value for the right child. Like theESA, the nature of the four arrays in the new child table used by the LST makes it possible torepresent the relevant information in the table using only n integers rather than the 4n integersthat ordinarily would be required. However, unlike the ESA, the LST uses a complete binarytree as the basic structure to represent information about sibling nodes at a given node. Thisimportant difference makes it possible for the LST to support pattern matches in O(m log |Σ|)time, the same time bound for suffix trees.
CHAPTER 2. RELATED WORK 15
2.3 Markov Models and Probabilistic Suffix Tree
2.3.1 Variable Length Markov Models
A Markov model is a sequence of stochastic events {Xn,n = 0,1,2,3...} with state space Sthat satisfies the Markov property:
P(Xn+1 = j|Xn = i,Xn−1 = in−1...,X0 = i0) = P(Xn+1 = j|Xn = i)
A Markov model (or Markov chain) of order L, where L is finite, is a sequence of eventssatisfying:
P(Xn+1 = j|Xn = i,Xn−1 = in−1...,X0 = i0) = P(Xn+1 = j|Xn = i,Xn−1 = in−1...,Xn−L = in−L)
Thus, in an order-L Markov chain, the current state is dependent on the past L states. Fixedlength Markov models (FLMM) represent a probabilistic finite state machine which can beused to model arbitrarily complex sequential data. Such models aim at learning the probability,P(σ|C), the conditional probability distribution of a symbol σ, given its context C, where σ∈ Σ,and C ∈ ΣL, L is the order or memory length of the model, and is fixed. The FLMM of order Lis represented as a ΣL×Σ matrix. The space requirement is thus in O(ΣL+1).
Variable length Markov Models (VLMM) differ from FLMM in an important way. Vari-able length Markov models attempt to learn the conditional distribution of a symbol wherebythe context length or model order could be varying, depending on the data being modeled. Es-sentially, for VLMM, C ∈
SLi=1 Σi. This property of varying memory length implies that with
the VLMM, Markov dependencies of varying order in the training data – both large and small– could be captured with ease. This flexibility of the VLMM however comes at a huge cost interms of space requirement. To represent a VLMM of order L, we have to store L matrices, onefor each order, from 1 to L. The total space requirement will be in O(Σ2 + Σ3 + ...+ ΣL+1), orO(ΣL+2). The space is huge, even when L is small. Thus, the space requirement for Markovmodels is exponential in L, whether we consider fixed length or variable length models.
An important observation that could point to a potential reduction in the space requirementfor Markov models is that for a given sequence of length n, there are n(n + 1)/2 possible sub-
CHAPTER 2. RELATED WORK 16
strings in the sequence. Thus, there are at most n(n+1)/2 states that can be represented in theMarkov model for the sequence, for any given order. For the length-n sequence, the maximumorder of a Markov model will be n−1. Thus, with knowledge of the sequence, we can have alimit on the possible number of states in its Markov model.
2.3.2 Probabilistic Suffix Tree
The probabilistic suffix tree (PST) is a probabilistic suffix model which is based on thetraditional suffix tree. Like the suffix tree, the PST represents all the n(n + 1)/2 substringsfrom the root to the leaf nodes. The PST models variable length Markov models, which meansthat the string depth is not fixed for every node. For an FLMM, its corresponding PST canbe obtained from the PST of the VLMM by constraining each leaf node to be of the samestring depth. The transition probability of a symbol on a given path is computed as the relativefrequency of the symbol in the observed data, given the preceding substring on the path. Thelength of the substring used to determine such conditional probabilities is simply given by thememory length or order of the model.
Example PST and ST for a sample sequence T = accactact$ are given in Chapter 4(see Figure 4.2).
The original algorithm [109] used an O(Ln2) time complexity to construct and prune thePST from a suffix tree. The improved algorithm [88] used balanced red-black trees [31] toconstruct the PST in O(Ln logn) time complexity. Apostolico and Bejerano [9] presented an al-gorithm having an O(n) time complexity using suffix links and reverse suffix links, independentof L.
2.3.3 Computing T F and DF via Suffix Arrays
In [9, 109], the conditional probabilities required for the PST were computed as relativecounts, using the notion of empirical probabilities, based on symbol frequencies from the ob-served data. To compute the empirical probabilities, we use the notions of term frequency anddocument frequency as used in information retrieval. The term frequency, T F is simply the
CHAPTER 2. RELATED WORK 17
number of times a given term (or substring in our case) occurred in a given text. Here, thetext could contain many documents. Therefore, the document frequency for a given term is thenumber of times the term occurred in the document. Thus, the term frequency is easily obtainedfrom the document frequency as a simple sum. When the text contains only one document, theT F and DF will be the same.
There are n(n + 1)/2 substrings in a sequence of size n. Using a naıve algorithm, wewill need to compute T Fs for all the n(n + 1)/2 terms. However, with the suffix tree for thissequence, we have n leaf nodes and at most n internal nodes. These 2n nodes represent then(n+1)/2 substrings in the sequence. Therefore, some multiple substrings at different positionsin the sequence will be represented by the same node. These substrings represented by the samenode must have the same frequency count. Since the node labels are unique in the suffix tree,this means that the multiple substrings in the same node are essentially the same substring, thatwere repeated multiple times in the sequence. Thus, while there are O(n2) possible substringsin T , there are at most O(n) unique substrings. Hence, we need to compute the T F for only the2n unique substrings.
Yamamoto and Church [131] presented a data structure to represent nodes in the suffix treeusing a suffix array. They called this structure the interval array. Using the interval array, theiralgorithm calculated all the required T Fs in O(n) time, but needed O(n logn) time to computethe DFs. Our data structure is based on the interval array, and we will show how we can improvethe algorithm for computing DF to O(n) time.
Interval Array Structure
From the suffix tree, we know we can cluster the potential n(n+1)/2 substrings into at most 2n”groups”. In [131], the interval array was proposed as a data structure to represent these groups.Table 2.1 shows the interval array for sample sequence T = accactact$. From the table, wecan describe some important properties of the interval array as follows.
For all substrings, we have the following:
1. There are at most 2n groups in the document of size n. Substrings in the same group have
CHAPTER 2. RELATED WORK 18
the same statistics (for example: T F and DF) and the same derivative measurements fromthese statistics.
2. An lcp-delimited interval < i, j > is constructed using the LCP array, where < i, j > is aninterval on the suffix array. An lcp-delimited interval < i, j > must meet the condition:
max(LCP[i],LCP[ j+1])< Lengthgroup(i, j)≤min(LCP[i+1],LCP[i+2], ...,LCP[ j]), where,Lengthgroup(i, j) is the LCP of the suffixes that belong to the same group with the < i, j >
interval.
The lcp-delimited intervals have the following properties:
1. Each lcp-delimited interval represents one unique group of substrings.
2. The maximum length of an lcp-delimited interval < i, j > is given by:
min(LCP[i+1],LCP[i+2], ...,LCP[ j]).
3. A non-trivial lcp-delimited interval is one with a start position that is less than the endposition. That is, the lcp-delimited interval < i, j > is non-trivial if i < j. Otherwise,if i ≥ j, the interval is said to be trivial. There are n trivial lcp-delimited intervals withT F=1. There are at most n−1 non-trivial lcp-delimited intervals with T F > 1
4. The lcp-delimited intervals for a document can form a nested structure of intervals, butno two lcp-delimited intervals can overlap. Thus, the lcp-delimited intervals can be rep-resented in a tree-like structure.
5. Let α and β be two substrings in the same lcp-delimited interval < i, j >. Then, thefollowing two conditions hold:
Table 2.1. Interval ArrayInterval LCP Term Freqency< 2,3 > 3 2< 1,3 > 2 3< 6,7 > 2 2< 4,7 > 1 4< 8,9 > 1 2
CHAPTER 2. RELATED WORK 19
• T F(α) = T F(β) = j− i+1
• DF(α) = DF(β)
Below we consider algorithms for computing T F and DF .
Determining the lcp-delimited intervals
Computing the term frequency (T F) is almost analogous to determining the lcp-delimited in-tervals. Given the lcp-delimited interval < i, j >, the T F algorithm in [131] uses the relationT F(< i, j >) = j− i+1 to determine the term frequency for the interval. The time complexityof the algorithm is O(n), using O(n) space.
We observe that there are at most n neighboring LCP pairs (i.e. LCP[i] and LCP[i + 1]) ina suffix array of size n. Thus, there are at most n increasing orders in such neighboring pairs.Similarly, there are at most n decreasing orders between the neighboring pairs. A decreasingorder between neighboring pairs LCP[i] and LCP[i + 1] implies that there exists at least onelcp-delimited interval < k, i >, where k ≤ i. Using the foregoing and the fact that we have atmost 2n lcp-delimited intervals in a document of size n, Yamamoto and Church [131] computedthe term frequency. Given that the lcp-delimited intervals could be nested, a stack can be usedto hold the information on the intervals.
Assuming we have m decreasing neighboring pairs in an LCP array of size n. Let theinterval between each pair be: n1,n2, ...,nm. Hence, there are n1 +1+n2 +1+n3 +1, ...ni +1+...nm + 1 = m + ∑
mi=1 ni ≤ m + n ≤ 2n lcp-delimited interval. When we find the ith decreasing
neighboring pair, we will output the ni intervals. Thus, determining the lcp-delimited intervalscan be done in O(n) time. This linear time complexity can be compared with the work reportedin [2], where they constructed a similar structure (the LCP-interval tree) in O(n log |Σ|) time.
Computing the Document Frequency (DF)
Determining the document frequency (DF) is more difficult than computing the term frequency(T F). In [131], an algorithm was given to calculate DF in O(n logn) time and O(n) space. The
CHAPTER 2. RELATED WORK 20
algorithm (reproduced below in Figure 2.1 ) uses the procedure described above to get the lcp-delimited intervals. It uses a new array to map the first symbol of the substring to a documentid based on the order of the suffix array index. Using a simple algorithm, we could search ineach interval to determine the document frequency. This will however be too expensive in timecost, leading to time in O(n2). Given that lcp-delimited intervals could be nested, we only needto check in the extra range over the calculated range. So the core of the algorithm is to checkwhether the document id of the current position has been previously computed. We reproducethe DF algorithm proposed in [131] below (see Figure 2.1). In Chapter 4, we modify thisalgorithm for an improved time complexity.
Figure 2.1. Algorithm for computing document frequency [131]
CHAPTER 2. RELATED WORK 21
2.4 Circular Pattern Matching
2.4.1 String Pattern Matching
The pattern matching problem is to find the occurrences of a given pattern in a given textstring. This is an old problem, which has been approached from different fronts, motivatedby both its practical significance and its algorithmic importance. Matches between strings aredetermined based on the string edit distance. Given two strings T : t1...tn and P : p1...pm,over an alphabet Σ, the edit distance indicates the minimum member of edit operations whichtransform one string into other string. There are three basic edit operations: insertion of asymbol, deletion of a symbol, and substitution of a symbol with another symbol. We assumethat the cost of each edit operation is unity. If two characters are identical, the cost of the matchoperation is zero. The substitution operation can also considered as a mismatch operation. Theexact string matching problem is to look for all occurrences of a pattern matching a substringof the text with zero edit operations. Various algorithms have been proposed for both exact andapproximate pattern matching [3, 11, 20, 34, 46, 48, 64, 117].
Grossi and Vitter [45] pointed out that a full-text indexing system is expected to be ableto support three basic types of queries, existential query: returns a binary value (true or false)indicating whether a pattern, P occurs in the text T ; counting query: returns nocc, (0 ≤ nocc ≤n), the number of occurrences of P in T ; and enumerative query: returns nocc numbers, eachindicating the starting position in T , of an occurrence of P [3].
Exact String Matching
Three well-know efficient string matching algorithms with linear time complexity are the Knuth-Morris-Pratt(KMP) algorithm [64], Karp-Rabin algorithm [59] and the Boyer-Moore(BM) al-gorithm [20]. Like KMP, the BM algorithm matches the pattern and the text by skipping char-acters that are not likely to result in exact matching with the pattern. Unlike the other methods,it compares the strings from right to left of the pattern. These algorithms need an O(m) prepro-cessing for the pattern and search in O(n) or sometimes even sublinear in practice. The totaltime will be O(n+m).
CHAPTER 2. RELATED WORK 22
A different approach to pattern matching based on bitwise operations was introduced by R.Baeza-Yates and G. Gonnet [13]. Here, the pattern is represented by a binary mask. Bit-wiseSHIFT and AND operations that are considered constant time are used to find the patterns.Under this framework, SHIFT and AND correspond to the pattern movement and matchingrespectively. The algorithm is effective for small patterns, when the pattern length is less than acomputer word (say, 64 characters), which is usual for the text searching problem.
When multiple patterns need to be searched, alternative algorithms are used, such as theAho-Corasick algorithm [5] which used a keyword tree for the set of patterns. In addition tomultiple pattern matching, the suffix tree algorithm [46] is efficient when the pattern will besearched multiple times. There are linear time suffix tree construction algorithms available[89, 122, 128]. The search time for each pattern will be O(m). The total time for searching spatterns will thus be O(n+ sm). This can be compared with O(m+ sn) that would be needed byalgorithms such as KMP and BM.
Approximate String Matching
Algorithms for the approximate pattern matching problem can be grouped into three majorcategories: methods based on dynamic programming [114]; methods based on bit-wise oper-ations [130]; and methods based on the longest common subsequence (LCS) [68]. The editdistance can be computed using dynamic programing. The path that leads to the minimal dis-tance can easily be identified by adding trace-back pointers. The time complexity for computingthe edit distance is generally in O(mn). When the number of allowed errors k is known, the editdistance can be computed in O(kn), for example using Ukkonen’s deterministic finite state au-tomaton (DFA) approach [121]. Typical approximate matching algorithms based on bit-wiseoperations are AGREP [130] and NRGREP [95]. These are obtained by extending the bit-wiseexact matching algorithms. Another variation of pattern matching with errors is the k-mismatchproblem, where only substitution operations are considered during the edit distance computa-tion. In [68], the k-mismatch problem is solved using the suffix tree so that the LCS query canbe answered in constant time. To find all the k-mismatch patterns of P in text T , we performk-mismatch checks for every alignment of P, each time starting at different character positionin T .
CHAPTER 2. RELATED WORK 23
Dynamic programming is the most popular algorithm in the approximate string matching(ASM). From the classical searching for longest common subsequence (LCS) [31] to multiplestrings alignment [25, 125]. It uses divide and conquer method to break a complex probleminto subproblems by solving the subproblems first and then integrating the solutions to get theoptimal answer. The search time is O(mn) and its space requirement is O(m). This algorithmcan be improved to achieve O( kn√
|Σ|) [26] on average. Smith-Waterman algorithm is one of
the most famous algorithms in this category which was first proposed by Temple F. Smith andMichael S. Waterman in 1981 [116] for local alignment. Needleman-Wunsch [96] algorithm is ageneral global alignment technique used in bioinformatics. Both of these algorithms are basedon dynamic programming. Smith-Waterman [116] algorithm guarantees finding the optimalsolution for local alignment and Needleman-Wunsch [96] algorithm finds the optimal solutionfor global alignment.
2.4.2 Circular Pattern Matching Problems
A circular shift is a mapping f : Σ∗→ Σ∗, f t(c1...cr) = ct+1...crc1...ct , where 0≤ t ≤ r−1and r is the length of string c1c2...cr. Thus, f 0(c1...cr) corresponds to the original string. Let[s] be a set of circular shifts of string s, then [s] = { f i(s)|0 ≤ i ≤ |s|− 1}. Given two circularstrings s1 and s2, the edit distance between s1 and s2, ED(s1,s2), is the minimum number of editoperations needed to transform one member of [s1] to one member of [s2]. This is defined asEDc(s1,s2) = min{ED[ f i(s1), f j(s2)]|0 ≤ i ≤ |s1| − 1 and 0 ≤ j ≤ |s2| − 1}, where |s1| is thelength of string s1 and |s2| is the length of string s2, ED[A,B] is the standard edit distance. Weconsider two major problems related to circular pattern matching (CPM) defined as follows.
Problem 1: Exact circular pattern matching (ECPM). Given one circular pattern P =P[1...m] and the text T = T [1...n], return all occurrences of circular string [P] and its circu-lar shifts inside text T without any error. [P] is a match to text T at position j ∈ [1...n−m+1]⇔ f t(P) = T [ j... j +m−1], for some t, 0≤ t ≤ m−1.
Problem 2: Approximate circular pattern matching (ACPM1). Given text T = T [1...n],circular pattern P = P[1...m] and maximum error k, return “Matching” when the edit distancebetween text T and circular pattern P is less or equal to k. Thus, the result will be the matchingpair g = {(T,P)|EDc(P,T )≤ k,s.t.− k ≤ n−m≤ k} .
CHAPTER 2. RELATED WORK 24
This problem uses the existential query to look for circular pattern matches between textT and circular pattern P, where −k ≤ n−m ≤ k. This problem compares two sequences withcircular edit distance less than k.
We also consider a harder variation of the ACPM problem with the extension −k ≤ n−m.This variation is to find circular permutations of P inside T . This problem compares circularpattern P and substring of T with circular edit distance less than k. We define this variationmore formally as Problem 3 below:
Problem 3: Approximate circular pattern matching problem 2 (ACPM2)
Given text T = T [1...n], circular pattern P = P[1...m] and maximum error k, return allpositions where the circular string [P] matched text T with at most k errors. [P] is said to be ak-approximate match with text T at position j ∈ [1...n−m−k +1] i f EDc(P,T [ j... j +m])≤ k,where 0≤ t ≤ m−1,−k ≤ n−m.
Comparing with the ECPM problem, the ACPM2 problem looks for all approximate matcheswithin the given maximum error k. The ACPM1 problem looks for an approximate match be-tween two whole strings text T and circular pattern P, but the ACPM2 problem looks for allapproximate matches between every substring of text T and the circular pattern P and its circularshifts. The ACPM2 problem is the hardest of these three problems.
An extention of the CPM problem is the All-Against-All variant.
Problem 4: All-Against-All CPM problem
Given SeqDB, the database of sequences, the All-Against-All CPM problem is to compareeach sequence in the database with every other sequence in the database for possible circularmatches.
Similar to the standard CPM problem, we can also consider the ECPM and ACPM versionsfor the All-Against-All CPM problem.
CHAPTER 2. RELATED WORK 25
2.4.3 Exact Circular Pattern Matching (ECPM)
Given a pattern P and a text T , the exact circular pattern matching problem is to find theposition in T which matches a circular permutation of P. The exact circular pattern match-ing problem was first studied by Booth [19] in 1980. He proposed an algorithm to detect thelexicographically smallest conjugate of a word. Improved methods were proposed in [10, 36].However, the focus of the algorithms was on the canonical rotation(s) of a word. ECPM wasa particular case in that problem. The swap pattern matching problem [7] is related to theECPM problem. It looks for an exact match of a swapped pattern P in text T . We can see thatthe ECPM problem is a particular case of the swap pattern matching problem. Gusfield [46]discussed the ECPM problem as an end-of-chapter exercise but did not provide an explicit so-lution. Shiloach [115] provided an algorithm for the ECPM problem. However both only solvethe online version of the ECPM problem which does not use indexing to preprocess the text T .
Our work is perhaps more closely related Iliopoulos et al. [51] who proposed two algo-rithms that solve the ECPM problem using indexing. These two algorithms are based on suffixstructures and the time complexity are O(m log logn+nocc) and O(m logn+nocc) respectively,where nocc is the number of occurs.
The first algorithm builds a new data structure CPI-I to index circular patterns. Two stepsare used to find the circular pattern. First, the algorithm will index the text T in two suffix treesSTT and STT where T is the reverse order of T . It also maintains two list LL(R) and LL(R) whichare linked lists of all the leaf nodes from left to right in STT and STT respectively. Secondly,the algorithm will search two parts Q1,Q2 of each permutation of pattern P in STT and STT
respectively. STT returns the occurrences of Q1 and STT returns the occurrences of Q2. Thealgorithm finds the intersection of these two sets by using the two linked list LL(R) and LL(R).
The construction time and space complexity for this algorithm is O(n log1+ε n), where 0 <
ε < 1 and n is the length of text T . The query time complexity is O(m log logn+nocc).
The second algorithm used another new structure CPI-II to address the ECPM problem.The data structure CPI-II is constructed by the suffix array SA, inverse suffix array SA−1 andarrays Pre and Su f , where Pre is an array for the prefix of pattern P and Su f is an array forthe suffix of pattern P. There are three steps involved. First, compute the interval of prefix
CHAPTER 2. RELATED WORK 26
of pattern P into array Pre and the interval of suffix of pattern P into array Su f . There are mprefixes and suffixes in P. For each prefix or suffix, it is calculated using the previous prefixor suffix, so the time complexity is O(logn) using the suffix array SA and inverse suffix arraySA−1. Secondly, for each circular permutation pattern which can be constructed by P[m− i]P[i]where 1 ≤ i ≤ m, the algorithm finds the intervals of P[m− i] and P[i] in Su f [m− i] and Pre[i]separately. Thirdly, output the intersection of the intervals of P[m− i] and P[i].
The space complexity are O(n) bytes for this algorithm implemented using in the suffixarray. The time complexity for answering a query is O(m logn + nocc). When implementedusing the compress suffix array [3,45], its space complexity will be O(n logn) bits, but the timecomplexity for queries increases to O(m log2 n+nocc).
2.4.4 Approximate Circular Pattern Matching (ACPM)
The ACPM problem is to find k-approximate matches between circular pattern [P] and textT . The naıve method for the ACPM problem is to use each of circular strings f t(P) to calculatethe edit distance between T and f t(P), where m is the length of pattern and 0 ≤ t ≤ m− 1.Thus the dynamic programming procedure will be run m times. The time complexity of a naıvealgorithm to compute ED([P],T ) is O(m2n).
Maes [81] published a “divide and conquer” algorithm to compute ED([P],T ) in O(mn logm).Up to now, this is the best theoretical result for computing the edit distance between a circularpattern and a text. Given the significance of Maes’ algorithm, we present the details below.
In theory, Maes [81] algorithm is the best algorithm. It uses “divide and conquer” to cal-culate the edit distance by using a dynamic program table. This algorithm constructs an editgraph between text T and string PP (Figure 2.2 [87]), where PP is a concatenation of pattern Pto itself. In this edit graph, let path(x1,y1)−(x2,y2) be a path from vertex (x1,y1) to vertex (x2,y2).For each vertex (x,y) on this path, we have x1 ≤ x≤ x2,y1 ≤ y≤ y2. The edit distance betweenT and f i[P] in the subgraph can be computed by following a path from vertex (0, i) to vertex(n, i+m-1), where 0≤ i≤ m−1. Let Pathi be an optimal edit path between f i(P) and T in theedit subgraph. That is a path of minimum cost from vertex (0, i) to vertex (n, i+m-1). Maes’algorithm is based on an important observation on the edit graph: if Pathi and Path j are each
CHAPTER 2. RELATED WORK 27
an optimal edit path, then Pathi and Path j can not cross each other, where 0 ≤ i < j ≤ m− 1.We can see that when Pathi and Path j has a crossing point say at (x,y), then edit distance ofpath(0,i)−(x,y) is less than edit distance of path(0, j)−(x, y), whenever i < j. Thus Path j is nolonger an optimal path. The set of paths {Patht |i < t ≤ j} do not have optimal path, becauseeach Patht has to cross Pathi at the point (x,y).
Figure 2.2. Edit Graph of T and PP [87].
Based on the above observations, the algorithm calculates Pathi and Path j first, where i < j.If two paths do not cross each other, that means there is an optimal edit path between Pathi andPath j. Next, they calculate the path Patht in-between Path j and Path j, where i < t < j. In thiscase, the time for calculating Patht is O(( j− i)× n). When Pathi and Path j cross each other,we do not need to calculate the path set {Patht |i < t ≤ j} anymore, because these is no optimalpath between them.
Following this idea, the algorithm calculates all optimal paths starting from Path0 andPathm, where Pathm is the optimal path from (0, m) to (n, 2m− 1), and Path0 and Pathm
are parallel paths. Figure 2.3 illustrates this algorithm. The step (1) of Figure 2.3 shows thiswith time cost of O(mn). In the second step(step (2) of Figure 2.3), the algorithm computesthe optimal path Path
m2 between Path0 and Pathm with time complexity of O(mn). In the third
step (step (3) of Figure 2.3), two optimal paths Pathm4 and Path
3m4 are computed. Time cost for
computing each path is O(mn2 ), hence time complexity of this step is O(mn) too. The step 4
(step (4) of Figure 2.3) calculates four paths Pathm8 ,Path
3m8 ,Path
5m8 and Path
7m8 . Time cost of
CHAPTER 2. RELATED WORK 28
computing each path is O(mn4 ), so time complexity of this step is O(mn) too. And so on and so
forth, there are O(logm) steps and time cost of each step is O(mn). Even when there are somecrossing points, the time for calculating each step is still O(mn). Thus the total time complexityis O(mn logm). After getting all optimal paths, the minimum edit distance between text T andcircular pattern P can be computed.
Figure 2.3. Maes’ Algorithm
Gregor et al. [44] gave a O(m2n) algorithm, however, this is a data-dependent algorithm. Inpractice, the algorithm may reach to O(mn) time complexity on average. Oncina [98] presentedan algorithm which has the same time complexity as Gregor et al. [44]. Marzal et al. [87]provide a branch and bound algorithm which is based on Maes [81] algorithm. The worst casetime complexity is the same as Maes algorithm, but with more efficient time complexity onaverage.
The above methods all produce complete results in calculation of the circular edit distance.Some studies [22, 90–92] also present suboptimal algorithms with reduced time complexitythat runs on O(mn), but with the possibility of missing some results. Bunke and Buhler [22]presented a suboptimal algorithm whose time complexity is O(mn). Mollineda [90–92] pub-lished two algorithms based on Bunke algorithm [22] and showed in an experiment that thesuboptimal solution is almost as good as the optimal counterparts.
CHAPTER 2. RELATED WORK 29
We note that all the above methods on the ACPM problem have only considered the ACPM1variant. To our knowledge, there has been no published work addressing the more challengingACPM2 problem.
2.4.5 ACPM Problem in Protein Sequences
A number of studies have been reported on algorithms for detecting circular permutationsfor protein sequences [40,47,55]. The first method [54] used the dot matrix and human visual-ization to identify circular relationships between protein sequence pairs. The work in [6] useda dictionary method to find short fragments common to the protein sequence pairs and usedhuman visualization to report the best local matches.
Needleman et. al [96] proposed a method for global alignment between two protein se-quences. The global alignment algorithm measures the number of edit operations (insertion,deletion, and substitution) for transforming one sequence to another sequence. Uliel et al. [123,124] introduced a method to detect circular permutations in protein sequences using globalalignment [96]. They gave an O(m3) time complexity algorithm to find the complete set ofmatching circular permutations. They also proposed a greedy algorithm in O(m2) time com-plexity, but which could miss some valid circular permutations in the text T . Weiner et al.[126, 127] proposed another greedy method that runs in O(m2) time complexity. They focusedon circular multidomain proteins, where the alphabet are now the protein domain blocks, ratherthan traditional protein symbols. Thus, |Σ| could be quite large, of the order of 20q, where q isthe length of the domain blocks. This is the first application of the CPM problem in studyingmultidomain proteins. However, they did not consider the problems posed by the expandedalphabet.
The algorithm of Uliel et al. [123, 124] used a simple method that calculated the edit dis-tance [72] between text and one of the circular permutations using the Needleman and Wunschalgorithm [96]. This was repeated m times for all the circular permutations, with O(m3) timecomplexity and O(m2) space complexity for each sequence as a pattern P against on the otherprotein sequences. The greedy algorithm of Uliel et al. [123, 124] modified the local align-ment algorithm to find the best local alignment in a 2m× n matrix. This algorithm is similarto the Smith-Waterman local alignment algorithm [116]. It is not guaranteed to find all circular
CHAPTER 2. RELATED WORK 30
matches, and thus may miss some valid matches. The algorithm of Weiner et al. [126, 127] isalso a greedy algorithm and thus could miss some valid matches. They concatenated the text Tas T T and the pattern P as PP, and thus constructed a 2n×2m matrix using the Needleman andWunsch algorithm [96]. At the verification phase, the circular matching condition must satisfycertain conditions defined on the 2n×2m matrix [126, 127].
More fundamentally, both groups [123, 124, 126, 127] that have studied CPM in proteinsequences have focused on whole sequence comparison with another whole sequence. In theirexperiments, they have to group the protein sequences based on their specified lengths, and usedthe dissimilarity in lengths for initial pruning. These methods ignored the fact that a shortercircular protein sequence could be part of the functional region of a much larger multidomainprotein. This, however, could be a key consideration in function prediction for multidomainproteins. Further, as with the more theoretical algorithms for the ACPM problem, the methodsfor protein sequences [123, 124, 126, 127] also only considered the ACPM1 problem.
2.5 Pattern Discovery Problem
Pattern discovery is a well studied problem in computational biology and data mining, andvarious methods have been proposed. The basic method is to identify short sequences that tendto be over-represented within a given set of sequences. Mining sequential patterns was studiedin [30, 132]. Motif discovery methods in bioinformatics are surveyed in [112]. Algorithms fordiscovery of proximity patterns were proposed in [12]. Proximity pattern discovery is closelyrelated to the more recent notion of ”complex motif”, which is defined as a composite motifwhereby the individual components are constrained to be within a specified seperation distance.Perhaps, a more closely related work is the method of pattern discovery using mutable permu-ation patterns [49]. However, although permu-patterns offer a lot of flexibility in the match,ignoring the order of the patterns still does not handle the problem of possible cyclic relationsbetween patterns. Most efforts in pattern discovery have been invested in studing the statisticalsignificance of the patterns (see for e.g. [85,86]), and the biological relevance of the discoveredpatterns, in the case of biological applications [112]. There has not been much attention on thepattern matching problem involved, which forms the basis of pattern discovery.
Chapter 3
The Virtual Suffix Tree
3.1 Introduction
Our proposed data structure is most closely related to the ESA and LST. The virtual suffixtree can be constructed in the same time and space bounds as the suffix tree. It also supportsbasic search operations in the same time and space bound as the suffix tree. However, the VSTrequires a much smaller practical space than the suffix tree. The space requirement (12.05nbytes using the compact form) is generally smaller than that of ESA and LST, the other closelyrelated data structures (each requires 20n bytes). Other related data structures that have beenproposed include the suffix cactus [56], suffix vectors [93, 103], compact suffix trees [82], thelazy suffix trees [41], level-compressed suffix trees [8], compressed suffix trees [94], and com-pressed suffix arrays [45]. See also [3].
Main results1 We introduce a new data structure, the virtual suffix tree (VST), an efficientdata structure for suffix trees and suffix arrays. The VST neither stores the lcp array nor thelcp-intervals, but rather exploits the inherent nature of the suffix tree topology. We state ourmain results in the form of two theorems about the VST.
1Part of the work reported in this chapter has been published in the following papers: [78, 79]
31
CHAPTER 3. THE VIRTUAL SUFFIX TREE 32
Theorem 3.1: Given a string T = T [1..n], with symbols from an alphabet Σ, and the virtualsuffix tree for T , we can count the number of occurrences of a pattern P = P[1..m] in T inO(m log |Σ|) time, and locate all the ηocc occurrences of P in T in O(m log |Σ|+ηocc) time.
Theorem 3.2: Given a string T = T [1..n], with symbols from an alphabet Σ, the virtual suf-fix tree, including the suffix links, can be constructed in O(n) time, and O(n) space, independentof Σ.
Essentially, the VST provides the same functionality as the suffix tree, but at a much smallerspace requirement. It has the same linear time construction for large |Σ|, requires O(n) spaceto store, and allows searching for a pattern of length m to be performed in O(m log |Σ|) time,the same time needed for a suffix tree. To provide the complete functionality of the suffix tree,we describe a simple linear time algorithm that computes the suffix links based on the VST.We present two algorithms for VST construction. The first algorithm builds the VST from thesuffix tree, which in turn is generated from the suffix array. The second algorithm eliminatesthe need for the suffix tree construction step, and thus builds the VST directly from the suffixarray. Although the space needed for the VST is linear (as in suffix tree implementations usinglinked lists or binary trees), the practical space requirement is much smaller than that of a suffixtree. The VST requires less space than other recently proposed data structures for suffix treesand suffix arrays, such as the ESA [2], and the LST [62]. On average, the space requirement(including that for suffix arrays and suffix links) is 13.8n bytes for the regular VST, and 12.05nbytes in its compact form. This can be compared with the 20n bytes needed by the LST or theESA.
Organization In Section 2, we introduce the basic data structure and discuss the propertiesof the VST. Section 3 presents an improved data structure, along with algorithms for its con-struction. A complexity analysis on the construction and use of the VST is also presented inthis section. Section 4 shows how the suffix link can be constructed on the VST. In Section 5,we eliminate the need to construct the suffix tree, and show how the VST can be constructeddirectly from the suffix array. We make the summary in Section 6.
CHAPTER 3. THE VIRTUAL SUFFIX TREE 33
3.2 Basic Data Structure
Starting from the suffix array, we construct an efficient data structure to simulate the suffixtree (ST). We call this structure a virtual suffix tree (VST). The VST stores information aboutthe basic topology of the suffix tree, the suffix array, and the suffix links. Thus, the VST isrepresented as a set of arrays that maintains information on the internal nodes of the suffix tree.The leaf nodes are not stored directly. However, whenever needed, information about any leafnode can be obtained via the suffix array. Unlike the ESA and LST, the VST neither uses thelcp-interval tree nor stores the lcp array. We call the data structure a virtual suffix tree inthe sense that it provides all the functionalities of the suffix tree using the same space and timecomplexity as a suffix tree, but without storing the actual suffix tree. Later, we show that theVST leads to a more compact representation of suffix trees and suffix arrays. (We mentionthat [60] also used the term ”virtual suffix tree”, but for a limited form of the enhanced suffixarray).
Below, we present the basic VST. This structure will require 14 bytes for each node in theVST and supports pattern matching in O(m log |Σ|) time, for an m-length pattern. In the nextsection, we present an improved data structure that reduces the space cost by eliminating theneed to store edge lengths, while still maintaining O(m log |Σ|) time for pattern matching. Wealso describe a more compact structure for the VST that uses only 10 bytes for each internalnode of the VST, and 5 bytes for each leaf node. Pattern matching on this compact representa-tion will, however, be in O(m|Σ|) time.
Each node in the VST corresponds to a distinct internal node in the suffix tree. In its basicform, each node in the VST is characterized by five attributes. For a given node in the VST (saynode u), with a corresponding internal node in ST (say node uST ), the five attributes are definedas follows.
• sa index: index in the suffix array (SA index) of the leftmost leaf node under theinternal node uST of the suffix tree.
• fchild: the node ID of the first child node of uST that is also an internal node. (Scanningis done left to right; edges at a node are also sorted left to right in ascending lexicographic
CHAPTER 3. THE VIRTUAL SUFFIX TREE 34
order). If node u is a leaf node in the VST, the value will be negative. The absolute valuewill point to the first child node of the next internal node in the VST.
• elength: The edge length of the edge (v,u) in the VST, or equivalently (vST ,uST ) inthe suffix tree, where v is the parent node of u and vST is the parent node of uST .
• nfleaf: the number of child leaf nodes before the first child of uST that is also aninternal node.
• nnleaf: the number of sibling leaf nodes after uST , the current internal node of thesuffix tree, but before the next sibling internal node.
In terms of storage, the sa index, fchild and elength each requires one integer (4 bytes),while nfleaf and nnleaf each requires one byte of storage (assuming |Σ| ≤ 256).
3.2.1 Example VST
We use an example sequence to explain the above definitions. The suffix tree and VST forthe string missississippi$ are shown in Figure 3.1. Note that the string missississippi$is made intentionally different from mississippi$, to capture some of the cases involvedin a VST. Only the internal nodes (dark nodes) are explicitly stored in the VST. The leaf nodes(empty circles) are not stored. The order of storage is based on the node-depths, from top tobottom. Table 3.1 shows the corresponding values of the VST node attributes for each VSTnode in the example.
3.2.2 Properties of the Virtual Suffix Tree
We can trace the properties of the VST based on the standard properties of a suffix tree.
1. The VST only stores the internal nodes of the suffix tree. No leaf nodes in the ST arerepresented in the VST. Information about the leaf nodes can be obtained from the SAwhen needed. Then the space requirement of the VST depends on the topology of the thesuffix tree, or more specifically, on the number of internal nodes.
CHAPTER 3. THE VIRTUAL SUFFIX TREE 35
2. The number of leaf nodes in a suffix tree is n. The number of internal nodes in the suffixtree (and hence number of nodes in the VST) is at most n.
3. The VST stores only the SA index of the leftmost leaf nodes and information about thechild nodes.
4. For a given node in the VST, the number of child nodes will be no larger than |Σ|. Thus,the time needed to match a symbol is at most O(log |Σ|).
5. The nodes in the VST are ordered based on the internal nodes of the suffix tree using thehierarchy sequential access method (HSAM). The child nodes from any given node willbe stored sequentially. The child nodes of two nearby nodes will therefore be stored innearby locations. This is an important property for addressing problems involving localityof reference.
We introduce further definitions needed in the description below. For a given node u in theVST, we use the term prior node to denote the node that appears before the current node u in theHSAM ordering. Similarly, next node denotes the node that appears after the current node u inthis ordering. We use lsa index (left sa index) to denote the SA index of the leftmost leafnode that is a descendant of u. Similarly, rsa index (right sa index) denotes the rightmostleaf node that has u as its ancestor. Figure 3.2 shows an example.
It is simple to determine the lsa index and the leftmost child node of any given node.The properties of the VST and the organization of the VST lead to the following lemma aboutthe VST:
Table 3.1. VST node attributes for the example sequence T = missississippi$ used
in Figure 3.1.node root N1 N2 N3 N4 N5 N6 N7 N8 N9
sa index 0 1 7 9 3 9 12 4 10 13fchild N1 N4 −N5 N5 N7 N8 N9
elength 0 1 1 1 3 1 2 3 3 3nfleaf 1 2 2 0 1 1 1 2 2 2nnleaf 0 1 0 0 0 0 0 0 0 0
CHAPTER 3. THE VIRTUAL SUFFIX TREE 36
Lemma 1: For a given node in the VST, its rightmost child node, and the right sa index
can each be determined in constant time.
Proof: Let u be the current node in the VST, with parent node v. Let w be the next nodein the HSAM ordering. By property 5, if w is an internal node in the VST, the rightmost child(rchild) node of u will be the prior node to the leftmost child node of w. If w is a leaf nodein the VST, then w.fchild will point to the next node after u’s rightmost child node. Then thetime to determine the rightmost child node will be O(1).
For the right sa index, if u has a next sibling node, say w, the right sa index of u willbe the left sa index of this sibling node w minus the nnleaf of u. If u does not have a nextsibling node, the right sa index of u will be the right sa index of node v (u’s parent) minusthe nnleaf of u. That is,
u.rsa index=
{w.sa index−u.nnleaf−1 :u has a next sibling, w
v.rsa index−u.nnleaf :otherwise(3.1)
Thus the time required to determine the right sa index is O(1). �
3.2.3 Pattern Matching on VST
Lemma 1 provides an indication of how pattern matching can be performed on the VST.For pattern matching using the suffix tree, an important issue is how to quickly locate all thechild nodes for a given internal node. In the VST, each node points to its leftmost leaf nodeusing the sa index. During pattern matching, at any given node in the VST, we will need todetermine four parameters, namely the leftmost child node (lchild), the rightmost child node(rchild), the left sa index (lsa index) and the right sa index (rsa index). Theseparameters define the boundaries of the search at the given node. To search in a leaf node ofthe VST, we will need only the left sa index and right sa index of the node. When wesearch in an internal node, we will need all the four parameters to match a pattern. Lemma 1shows that for any given node, we can determine each of these parameters in constant time. Thefollowing examples illustrate the two cases involved in computing the rsa index, and how
CHAPTER 3. THE VIRTUAL SUFFIX TREE 37
pattern matching can be performed on the VST.
Example: Determining the right boundary from a next sibling node. Consider node N5 inFigure 3.2. The left sa index of N5 is 9 and the right sa index is 11, since N5.sa index=9and N5+1.sa index=12, and hence the right sa index of N5=12-1=11. The leftmost childnode is the fchild of the current node, thus the leftmost child of N5 is N8. The next nodeof the rightmost child node is N5+1.fchild=N9. Then the rightmost child node is N9−1=N8,since the child node will be stored side by side between sibling nodes.
Example Determining the right boundary from the right boundary of the parent node.Consider node N1 in Figure 3.2. The left sa index of N1 is N1.sa index=1. The rightsa index of N1 is N2.sa index - (N1.nnleaf -1)=7-1-1=5. The leftmost child node of N1
is N1.fchild=N4. The next node of N1 is N2. Since N2.fchild=-N5 is negative, N2 must be aleaf node in the VST. We therefore know that the next node of the rightmost child node of N1 willbe N5. Finally, the rightmost child node of N1 can be determined as N5−N1.nnleaf = N5−1 = N4.
We summarize the foregoing discussion as the first main result of this work:
Theorem 3.1: Given a string T = T [1..n] of length n, with symbols from an alphabet Σ, and thevirtual suffix tree for T , we can count the number of occurrences of a pattern P = P[1..m] in Tin O(m log |Σ|) time, and locate all the ηocc occurrences of P in T in O(m log |Σ|+ηocc) time.
Proof: The theorem is a consequence of Lemma 1. First consider the cost of one single symbol-by-symbol comparison at a node in the VST. The number of child nodes at any internal nodecan be no larger than |Σ|, and we can find the boundaries of the search in constant time. Sincethe edges are ordered lexically at each internal node, and given the HSAM ordering, matching asingle symbol can be done in O(log |Σ|) time steps using binary search. To find the first match,we need to consider the m symbols in the pattern. We perform the above symbol-by-symbolcomparisons at most m times to decide whether there is a match or not. After a match is found,we can again use binary search (using lsa index and rsa index as bounds) to determineall the ηocc occurrences of the pattern. Reporting each occurrence can be done in constant time,or an additional ηocc time for all the occurrences. �
CHAPTER 3. THE VIRTUAL SUFFIX TREE 38
3.3 Improved Virtual Suffix Tree
The basic data structure introduced above stores the length of each edge in the VST. We canimprove the structure to reduce the space requirement by avoiding the need to store informationabout the edge lengths directly. The improved data structure has only four attributes rather thanfive. The attributes sa index and elength in the basic structure are now combined into oneattribute called the adjusted SA index (asa index). This requires a key modification to thesuffix tree, leading to an important distinction between the suffix tree and the virtual suffix tree.
3.3.1 Adjusting Edge Lengths
A well-known property of the suffix tree is that no two edges out of a node in the tree canstart with the same symbol. For efficient representation of the VST, this characteristic of the STis modified such that, for a given node, every edge that leads to an internal node in the VST hasan equal length. This modification is done as follows: Start from the root node and progresstowards the leaf nodes in the VST. For a given internal node, say u, adjust the edge label fromu to each of its children such that all edges that lead to an internal node will have the sameedge length. The major criteria is that, for two sibling internal nodes, their edge labels differonly in the last symbol. If for some edge, say (u,w), the original edge length (or edge label)is longer than the new length, prepend the extraneous part of old label(u,w) to each outgoingedge from w. The edge length for edges that lead to leaf nodes are left unchanged. Then repeatthe adjustment at each child node of u. Figure 3.3 shows an example of this procedure. Observethat this adjustment only affects the edge lengths, and does not change the general topology ofthe suffix tree.
The above adjustment procedure leads to an important property of the VST:
Property: In the improved VST, all internal sibling nodes occur at the same node-depth,and same string-depth, and the edge labels for the edges from the parent to each sibling differonly in the last symbol. This means that, in the VST, two branches from the same node can startwith the same symbol, but their edge labels will differ.
This property provides an important difference between the suffix tree and the VST. The
CHAPTER 3. THE VIRTUAL SUFFIX TREE 39
suffix tree mandates that no two edges from the same node have the same starting symbol.Further, the suffix tree only guarantees that the node-depth of two sibling nodes are the same,but not their string depth. This property of equal-length sibling edge labels is the key to moreefficient representation of the VST, without explicit edge labels. Figure 3.4 shows an exam-ple of the modified suffix tree with equal-length edges for sibling nodes that are also internalnodes, and the corresponding improved virtual suffix tree. Table 3.2 shows the correspondingvalues of the attributes for each node in the improved VST. What remains is how we computeasa index, the adjusted SA index. This is done by combining the original sa index withelength.
Lemma 2 : Given a node in the VST say u, and its parent node (say v), we can compute theadjusted SA index in constant time. Further, when required, the edge length can be determinedin constant time.
Proof: Computing the adjusted edge length (new elength) and the adjusted SA index(asa index) can be done using the following relations:
u.asa index=
u.new elength+u.sa index : u = v.fchild and
u.new elength 6= 1u.sa index : otherwise
(3.2)
u.sa index= u.fchild.sa index−u.nfleaf (3.3)
At time of VST construction, we calculate asa index from bottom to top. For leaf nodesin the VST, we already know the sa index and new elength, then we can calculate theasa index from Eqn (3.2). When the node u is an internal node in the VST, we first obtainu.sa index from Eqn (3.3) since we know u.nfleaf and u.fchild.sa index. Then wedetermine u.asa index from Eqn (3.2). The new edge length is not stored explicitly in theVST nodes, but can be computed in constant time whenever needed (for instance, during patternmatching) by simply changing the subjects in Eqns (3.2) and (3.3). This is possible since at thistime we already know u.asa index for each node in the VST. �
Thus while we store only the asa index, our calculations will still use the originalsa index. However, this can be derived from the asa index in constant time. In fact,
CHAPTER 3. THE VIRTUAL SUFFIX TREE 40
we can observe that in practice, we need to compute the asa index for only the leftmostchild node at each node-level, while keeping the original sa index for all other nodes. Todetermine the new elength for these other nodes, we simply make a constant time access totheir leftmost (sibling) node (at the same node-level), and then use this to compute the length.For searching with the VST, we will calculate the length of the common string at each level.If the length is greater than 0, then we know there is a common string in the edge labels forthe child nodes and only the last character is different. Thus, we do not need to store the edgelengths explicitly, leading to a reduction of one integer per node over the basic VST.
Table 3.2. Node attributes in the improved VST for the example sequence, T =missississippi$.
NodeName root N1 N2 N3 N4 N5 N6 N7 N8 N9
sa index 0 1 7 9 3 9 12 4 10 13fchild N1 N4 -N5 N5 N7 N8 N9
new elength 0 1 1 1 1 1 1 3 1 2nfleaf 1 2 2 0 1 1 1 2 2 2nnleaf 0 1 0 0 0 0 0 0 0 0asa index 0 1 7 9 3 9 12 4+3=7 10 13+2=15
We have included new elength, so one can compare with elength in Table 3.1. However, in practice this
will not be stored in the VST.
3.3.2 Construction Algorithm
Construction of the VST makes use of an array Q which records the internal nodes of thesuffix tree. This array maps the internal nodes of the suffix tree to nodes in the VST. Thus,elements in the array are in the same ordering as the corresponding nodes in the VST.
Given an input string T , the first step is to construct the suffix array for T . This can bedone in worst case linear time and linear space using any of the existing algorithms [57,63,65].Using the SA, we construct the suffix tree as described in [3]. While the suffix tree can beconstructed directly in linear time, working from the SA to the ST will require less space forthe construction. The suffix tree is then preprocessed in linear time to adjust the edges from agiven parent node that lead to internal child nodes to equal-length edges. Using the adjustedsuffix tree, the algorithm will process the internal nodes in the suffix tree in a top-down manner
CHAPTER 3. THE VIRTUAL SUFFIX TREE 41
to determine the attributes (fchild, nfleaf and nnleaf) for the corresponding nodes inthe VST. Next, we process the VST from the VST leaf nodes to the root, using the Q array toupdate the asa index at each node. The adjusted asa index field includes information onthe sa index and edge length.
The steps for constructing the VST for a given input string are summarized in Algorithm3.1.
3.3.3 Further Space Reduction
We can further reduce the space needed by the VST, at the cost of an increased time forpattern matching. In the pattern matching phase, if the algorithm is to compare symbols one-by-one, rather than using binary search on the branches from a given node in the VST, we willonly need to compute the lsa index and rsa index of the node.
Consider an arbitrary node (say node u) in the VST. The number of children from u or thenumber of u’s leaf nodes cannot be larger than |Σ|. Thus, the sa index of any child node of uwill lie between node u’s lsa index and rsa index. Then comparing one symbol from thepattern against the first symbol on each edge from u to its children will require at most O(|Σ|)time steps. The left child node and the right child node will not need to be used again. Thus, theattributes fchild and nfleaf in the leaf nodes of the VST are no longer required. We makethe asa index to be negative for the leaf nodes. Thus, during pattern matching, this serves asa flag for the VST leaf nodes. This compact structure will reduce the space requirement at eachleaf node of the VST by 5 bytes. Time for pattern matching, however, will increase to O(|Σ|)for each symbol in the pattern P, or O(m|Σ|) overall, where m = |P|.
3.3.4 Complexity Analysis
Time and space complexity.
The time cost for lines 1-3 in the construction algorithm CONSTRUCT-VST (Algorithm 3.1) isO(n)+O(n)+O(n)=O(n). Lines 5-17 in the algorithm perform a one time traversal of the nodes
CHAPTER 3. THE VIRTUAL SUFFIX TREE 42
in the suffix tree. The respective values of pTop and pBottom range from 1 to 2n. Thus the costfor the traversals is O(n). Lines 18-27 in the algorithm run at most pBottom times. The timefor lines 18-27 in the algorithm is thus O(n), since each iteration of the loop requires constanttime. Therefore, for the regular VST, the overall construction time is O(n). The time for patternmatching is in O(m log |Σ|). For the compact structure, the construction time is the same as theregular structure, but the VST is no longer stored linearly. Here we use an array to store therelation between the Q array and the compact VST. The searching time is now O(m|Σ|).
The space requirement clearly depends on the number of nodes in the VST, which is at mostn for a sequence of length n. Each node requires a fixed amount of memory to store, leading toan O(n) space requirement.
Number of nodes and practical space requirement.
The actual space needed for the VST depends on the topology of the suffix tree. This topologycan be captured by the number of internal nodes in the suffix tree, or alternatively, by thequantity RIL, the ratio between the number of internal nodes and the number of leaf nodes. Wecall RIL the density or branching factor for the suffix tree. We conducted an experiment toevaluate the effect of this branching factor on the storage requirement of the VST. The suffixtree was constructed and the branching factors computed for a set of files taken from [104]. Foreach file, we used the first 224 symbols as the text, and computed the branching factor. Table3.3 shows the results. The maximum ratio of 0.76 was observed for the file Jdk13c. Onaverage, however, the maximum ratio was around 0.63. The worst case occurs for a sequencewith |Σ| = 1, (that is, T = an), leading to a branching factor of 1. The table shows that, for agiven sequence, the branching factor depends on a complex relationship between n, |Σ|, and themean LCP.
The space requirement for the VST, for both the compact and regular structures dependsdirectly on the branching factor. The last two columns in Table 3.3 show the maximum spacerequirement for each file.
The foregoing discussion leads to the following lemma on VST construction:
CHAPTER 3. THE VIRTUAL SUFFIX TREE 43
Lemma 3: Given a string T = T [1..n], with symbols from an alphabet Σ, the virtual suffixtree (without suffix links) can be constructed in O(n) time, and O(n) space, independent of Σ.
3.4 Computing Suffix Links
Constructing the suffix tree from the suffix array as described in [3] does not include thesuffix link. There are also a number of other suffix tree construction algorithms that build thesuffix tree without the suffix link. See for example, Farach et al. [38], and Cole and Hariharan[29]. The suffix link, however, is a significant component of the suffix tree, and is important incertain applications, such as approximate pattern matching using matching statistics, and otherforms of traversal on the suffix tree. Thus, a data structure to support the complete functionalityof the suffix tree requires an inclusion of the suffix link. Recent efficient data structures for suffixtrees have thus provided mechanisms for constructing the suffix link. The ESA [2] providedsuffix links using complicated RMQ preprocessing [18]. The LST [62] also supported suffixlinks using the lcp-interval tree and intervals defined on the inverse suffix array. A recentwork by Maaβ [80] focused exclusively on suffix link construction from suffix arrays, or fromsuffix trees that do not have such links.
The virtual suffix tree provides a natural mechanism for constructing suffix links. The keyidea is that suffix links in the VST can be computed bottom-up, from the nodes with the highestnode-depth (leaf nodes) in the VST to those with the least (the root). This is based on thefollowing two observations about suffix links.
1. Consider a leaf node uST in the suffix tree corresponding to suffix Ti in the original se-quence. The suffix link from uST will point to the leaf node corresponding to the suffixTi+1 (that is, the suffix that starts at the next position in the sequence).
2. The suffix link from a node u in the VST will point to some node w with a smaller string-depth in the VST, such that |L(u)|= |L(w)|+1 (or equivalently |L(uST )|= |L(wST )|+1).
The following lemma establishes how we can build suffix links on the VST.
CHAPTER 3. THE VIRTUAL SUFFIX TREE 44
Lemma 4 Given the VST for a string T = T [1..n] of length n, the suffix links can be con-structed in O(n) time using an additional O(n) space.
Proof: Let u and w be two arbitrary nodes in the VST. Let v be the parent node of u. Letu.slink be the node to which the suffix link from node u points. We consider two cases:
Case A: u is a leaf node in the VST. Then, using the above observations, the suffix link fromnode u will point to node w in the VST (that is, u.slink = w) such that SA[w.sa index] =SA[u.sa index] + 1. Clearly, |L(w)| = |L(u)| − 1, where L(x) is the path label of node x.Note that this path label is not explicitly stored in the VST, but for each node, the length can becomputed in constant time. This computation can be performed in constant time by maintainingtwo arrays and observing that n−|L(w)|= n−|L(u)|+1. One array is the inverse suffix array(ISA) for the given string, defined as follows: ISA[i] = j if SA[ j] = i, (i, j = 1,2, ...,n). Thesecond is an array M that maps the SA values to the corresponding parent nodes in the VST,defined as follows: M[i] = u, if uST in ST is the parent node of the leaf node corresponding tothe suffix TSA[i]. Clearly, both arrays can be computed in linear time, and require linear space.
Case B: u is not a leaf node in the VST. This is a simpler case. When u is an internalnode in the VST, the suffix link from u will point to some node w, such that w is an ancestorof node u.fchild.slink, such that |label(u,u.fchild)|= |label(w,u.fchild.slink)|.The O(n) time result then follows by using the skip/count trick [46], by observing that a VSThas at most n nodes, a node depth of at most n, and that each upward traversal on the suffix linkdecreases the node depth by at least 1. �
Although the above description is from the viewpoint of a VST already constructed, thesuffix links can be constructed as the VST is being built, by some modification of the VSTconstruction algorithm. Algorithm CONSTRUCT-VST-WITH-SUFFIXLINKS (Algorithm 3.2)shows a modification of Algorithm algorithm CONSTRUCT-VST (Algorithm 3.1) to incorporatesections to compute the suffix links. The suffix link construction algorithm is based on theQ array used during the VST construction. We observe that the additional work required toconstruct the suffix links on the VST is independent of the alphabet size.
Figure 3.4 shows the result of the suffix link algorithm when applied to the VST of ourexample string T = missississippi$. Essentially, given the VST, the suffix link is con-
CHAPTER 3. THE VIRTUAL SUFFIX TREE 45
structed right to left, node-depth by node-depth, starting with the rightmost node at the deepestnode-depth, and moving up the VST until we reach the root. Thus, the order of suffix linkconstruction in the example will be SL1,SL2, . . . ,SL9.
Algorithm 3.2 shows that the additional work required to compute all the suffix links islinear in the length of the string. After construction, the suffix link on the VST will requireone additional integer per internal node in the VST. This can be compared with the 2 integersper node required to store the suffix link using the ESA, or LST. In a typical VST, where themaximum branching factor is usually less than 0.7, the suffix link will require a maximum extraspace of 0.7n ∗ 4 = 2.8n bytes. Table 3.4 shows the space required for the VST (including thesuffix array and suffix links) for both the compact structure and the regular VST, at varyingvalues of the branching factor.
We summarize the above discussion in the following theorem which captures the secondmain result of the work:
Theorem 3.2: Given a string T = T [1..n], with symbols from an alphabet Σ, the virtual suffixtree, including the suffix links, can be constructed in O(n) time and O(n) space, independent ofΣ.
Proof. The theorem follows directly from Lemma 3 and Lemma 4. �
3.5 From SA to VST
So far, we have constructed the VST by first building the suffix tree from the suffix array,and then converting the suffix tree to a VST. The major problem with this approach is therelatively large memory requirement for suffix tree construction (for instance, compared to itsstorage). In this section, we eliminate this problem by constructing the VST directly from thesuffix array, without a need to first construct the suffix tree.
The VST mainly encodes the structural information in a suffix tree, while avoiding the needto store some information that could be computed from the encoded structure. Thus, the keyto going from SA to VST directly is to observe how the SA encodes the structural information
CHAPTER 3. THE VIRTUAL SUFFIX TREE 46
in a suffix tree. The observation is that, given a sequence, the branching information in itssuffix tree can be determined by making use of the corresponding suffix array and lcp arrayof the sequence. The edge labels, and hence edge lengths can be determined by analyzing thedifferences between adjacent lcp values. In a sense, it was this same observation that was usedin constructing the suffix tree from the suffix array in [3] which was exploited in Algorithm 3.1.
Like the suffix tree, the VST has two types of nodes, leaf nodes and non-leaf nodes. TheVST encodes only the non-leaf nodes in the suffix tree. Each leaf node in the VST correspondsto an internal node in the suffix tree whose child nodes are all leaf nodes in the suffix tree. Thesuffix tree leaf nodes in turn point to positions in the suffix array. Thus, to determine the VSTnodes and their respective attributes from the suffix array, we consider whether the node is aVST leaf node, or a non-leaf node. We call the former Type 0 nodes, and the later Type1 nodes. The problem then is to determine how the VST node attributes are derived from theSA and lcp for each type of node. We take a two step approach. First, we scan the SA andlcp from left to right, and use a temporary data structure to record pertinent information abouteach node in the VST. The temporary data structure (denoted TA) will be an array of structures,(similar to a VST node structure), but a TA node will contain more information than a VSTnode. Each node in the VST has a corresponding entry in TA. In the second stage, we constructa mapping function (denoted MAP) that provides a one-to-one map from the elements in TA tothe VST nodes. At this stage, some attributes in TA are renamed, and non-required fields in theTA structure are removed to give the required virtual suffix tree.
The first stage makes use of two structures – the TA structure and a stack data structure.The stack structure has two elements, the stack value (an integer) and the stack type (one bit).The stack type shows the VST node types described above. Type 0 indicates an unmergedleaf node, while Type 1 indicates an unmerged internal node. The TA structure contains thesame attributes as a VST node (sa index, fchild, nfleaf, and nnleaf), in additionto two pointers, namely, next which points to the next sibling node of the current node, andrsa index, the rsa index of the current node.
The algorithm scans the suffix array (SA) and lcp array (LCP) from left to right, (assumesthe suffixes are sorted left to right in ascending order), and determines whether to create anew node based on the lcp values. The condition for starting the procedure to create a newnode is when LCP[Stack.top.index] is larger than lcp of current index. We exit from the
CHAPTER 3. THE VIRTUAL SUFFIX TREE 47
procedure when LCP[Stack.top.index] is less than lcp of current index. When entering thenode creation procedure, we use curNode to denote the current index, and the curLCP to denotethe lcp of current index. Whenever we exit the procedure, we run a special exiting function forhousekeeping, which could also create a new node. We make use of the following definitions:
Stack.top.index=
{Stack.top.value : Stack.top.type= 0TA[Stack.top.value].rsa index : Stack.top.type=1
(3.4)
curNode.sa index=
{curNode.value : curNode.type=0TA[curNode.value].sa index : curNode.top.type=1
(3.5)
Essentially, a node is created by merging an existing internal node with another internalnode, or with a leaf node, or by merging two leaf nodes to form an internal node. The algorithmmakes use of several auxiliary routines, depending on the type of node. There are two cases,corresponding to the two node types:
1. CASE 1: VST LEAF NODES.
Here, Stack.top.type=0. We consider two sub-cases.
Case 1A: LCP[Stack.top.index] > curLCP
• Case 1A1: LCP[Stack.top.index] 6= LCP[curNode.sa index]We merge Stack.top.index and curNode to a new element of TA, say T[k].Let Stack.top.index be the leftmost leaf node and curNode be the right-most child. The required update is performed using the merge1A1( ) routinedescribed as follows: If curNode is an index for SA, then T[k] has two leafnodes: if (curNode.type = 0), then update T[k] as follows: T[k].sa index =Stack.top.index, T[k].nfleaf=2 (since there are 2 leaf nodes), T[k].rsa index
= curNode.value. If curNode is an element of TA, then T[k] has one leaf node,
CHAPTER 3. THE VIRTUAL SUFFIX TREE 48
and one child node; then, update T[k] as follows: T[k].sa index = Stack.top.index,T[k].fchild = curNode.value, T[k].nfleaf=1 (since there is one leaf node),T[k].rsa index = TA[curNode.value].rsa index.
• Case 1A2: LCP[Stack.top.index] = LCP[curNode.sa index].In this case, curNode must be an element of TA. We update the node as follows:
TA[curNode.value].sa index = Stack.top.index,
TA[curNode.value].nfleaf = TA[curNode.value].nfleaf+1, and popthe stack.
Case 1B: LCP[Stack.top.index] = curLCP.
Again, in this case, curNode must be an element of TA. We simply update the num-ber of leafs, namely, numleaf=numleaf+1 and pop the stack
2. CASE 2: VST INTERNAL NODES.
Here Stack.top.type=1. We also consider two sub-cases.
Case 2A: LCP[Stack.top.index] > curLCP.
We merge TA[Stack.top.value] and curNode to a new element of TA, say T[k].The first child node will be TA[Stack.top.value], and the next (sibling) nodewill be curNode. The required update is performed using the merge2A() routinedescribed as follows: If curNode is an index for SA, then T[k] has one leaf node.Update T[k] as follows: T[k].sa index = TA[Stack.top.value].sa index,T[k].fchild = TA[Stack.top.value], T[k].nfleaf=0, T[k].rsa index = curN-ode.value,
TA[Stack.top.value].nnleaf=1 (this leaf is curNode). If curNode is an ele-ment of TA, then T[k] has two child nodes. If LCP[Stack.top.index] = LCP[Stack.(top-1).index] then pop the stack. (These two must be an element of TA). Then, updateT[k] as follows:
CHAPTER 3. THE VIRTUAL SUFFIX TREE 49
T[k].sa index = TA[Stack.top.value].sa index,T[k].fchild = TA[Stack.top.value], T[k].nfleaf=0,T[k].rsa index = TA[curNode.value].rsa index,TA[Stack.top.value].next = curNode.value.
Case 2B: LCP[Stack.top.index] = curLCP.
The update here is performed using the merge2B() routine, described as follows: IfcurNode is an index for SA, then, update as follows: TA[Stack.top.value].nnleaf= TA[Stack.top.value].nnleaf + numleaf +1. If curNode is an element ofTA, then update as follows: TA[Stack.top.value].nnleaf = TA[Stack.top.value].nnleaf+ numleaf, TA[Stack.top.value].next = curNode.value.
The special exit housekeeping procedure is performed using exitFunction(). The proce-dure is described as follows: If numleaf = 0, then push curNode into stack. Otherwise,(so we must have numleaf 6= 0), then let T[k] be the node resulting from merging curN-ode with the leaf which has the same LCP value as curLCP. Note that curNode is an el-ement of TA. Update T[k] as follows: T[k].sa index = TA[curNode.value].sa index-numleaf, T[k].nfleaf = numleaf, T[k].fchild = curNode.value, T[k].rsa index =TA[curNode.value].rsa index. Push T[k] into the stack, (equivalently, push (k) and set stacktype to 1), and increment k by 1. If curLCP ≤ lcp of next index then push curNode into thestack).
Algorithm 3.3 shows the steps for constructing the TA structure, given the SA and LCParray. Figure 3.5 shows the VST nodes (nodes in TA) created using the algorithm on our runningexample string T = missississippi$. Table 3.5 shows the attributes of each node in theTA structure. Notice that some nodes, such as TA[2] and TA[4] were updated at later steps ofthe algorithm, after their initial creation.
Algorithm 3.4 shows how the TA node labels are mapped to VST node labels. The algo-rithm computes elength for the TA nodes, in order to compute asa index, the adjustedsa index which is used in the VST to avoid storing the edge lengths. Table 3.5 shows theresult of the mapping for the TA structure shown in Fig. 3.5 and Table 3.5. The algorithm usesa simple auxiliary function computeChildren elengths() to determine the edge lengths.
CHAPTER 3. THE VIRTUAL SUFFIX TREE 50
Building the suffix links on the VST structure above can be done as was done earlier. Thetwo algorithms still maintain the linear time construction for the VST.
3.6 Summary
In this work, we have presented the virtual suffix tree (VST), an efficient data structure forsuffix trees and suffix arrays. The searching performance is the same as the suffix tree, that is,O(m log |Σ|) for a pattern of length m, with symbol alphabet Σ. We also showed how suffix linkscan be constructed on the VST in linear time, independent of the alphabet size. The VST doesnot store the edge lengths explicitly. This is achieved by modifying a key property of the suffixtree - the requirement that no two edges from a given node in the suffix tree can start with thesame symbol. This key modification leads to a major distinction between the VST and the suffixtree, and results in extra space saving. However, whenever needed, the length for any arbitraryedge in the VST can be obtained in constant time using a simple computation. A further spacereduction leads to a more compact representation of the VST, but at the expense of an increasedsearch time, from O(m log |Σ|) to O(m|Σ|).
The space requirement depends on the topology of the suffix tree, in particular, the branch-ing factor. For the compact structure, the worst case space requirement (including the suffixarray) is 11.5n bytes without suffix links, and 15.5n bytes with suffix links, where n is thelength of the string. However, in practice, the branching factor is typically less than 0.7. Forthe compact structure, this gives less than 9.25n bytes on average without the suffix links, or12.05n bytes with suffix links.
In this work, we started from efficient storage of the suffix tree and suffix array after theyhave been constructed. Thus, we constructed the VST from the suffix tree, which in turn wasconstructed from the suffix array. To reduce the space requirement at construction time, weintroduced another algorithm that constructs the VST directly from the suffix array. An inter-esting question is whether one can construct compressed versions of the VST, in a way that isanalogous to compressed suffix trees and compressed suffix arrays. This could lead to furtherspace saving for the VST.
CHAPTER 3. THE VIRTUAL SUFFIX TREE 51
(a) (b)
Figure 3.1. Suffix tree and virtual suffix tree for the string T = missississippi$.(a) suffix tree ; (b) virtual suffix tree. The number at each leaf node indicates the position in SA. The number ateach internal node indicates the node ID in the VST.
Figure 3.2. Example VST (solid nodes) showing left SA index (lSA) and right SA index (rSA)
for sample nodes.
CHAPTER 3. THE VIRTUAL SUFFIX TREE 52
(a) original tree (b) improved tree after adjusting the edge lengths
Figure 3.3. Edge-length adjustment procedure.
(a) modified suffix tree (b) improved virtual suffix tree
Figure 3.4. Improved VST for the string T = missississippi$.
CHAPTER 3. THE VIRTUAL SUFFIX TREE 53
Algorithm 3.1: VST Construction Algorithm
CONSTRUCT-VST(T,n)1 SA← COMPUTE-SUFFIXARRAY(T,n)2 ST ← SUFFIXTREE-FROM-SUFFIXARRAY(SA)3 ST ← ADJUST-EDGELENGTHS(ST )4 Initialize VST[],Q[], pTop=0, pBottom=0, curNode=root, Q[pTop]=root5 while (pBottom >= pTop)6 for ( each childnode in curNode) do7 if (childnode is internal node in ST ) then8 pBottom← pBottom + 1; Q[pBottom]← childNode9 if childnode is first internal node then10 VST[pTop].fchild← pBottom11 end if12 else13 Update VST[pTop].nfleaf and VST[pBottom].nnleaf14 end if15 end for16 pTop← pTop + 1; curNode← Q[pTop]17 end while18 for (pb← pBottom down to 0) do19 if (VST[pb] is leaf node) then20 VST[pb].asa index← Q[pb].fchild21 else if (Q[pb].elength=1) then22 VST[pb].asa index←VST[pb].fchild.asa index
+ VST[pb].nfleaf - Q[pb].elength23 else24 VST[pb].asa index←VST[pb].fchild.asa index
+ VST[pb].nfleaf - Q[pb].elength+ Q[pb].elength25 end if26 end if27 end for
CHAPTER 3. THE VIRTUAL SUFFIX TREE 54
Table 3.3. Branching factor and maximum space requirement for various sample files.File |Σ| Max Ratio Compact Regular DescriptionBible 63 0.61 8.60n 10.13n King James bibleChr22 5 0.73 9.50n 11.33n Human chromosome 22E.coli 4 0.65 8.89n 10.52n Escherichia coli genomeEtext 146 0.54 8.02n 9.36n Texts from Gutenberg projectHowto 197 0.55 8.13n 9.51n Linux Howto filesJdk13c 113 0.76 9.69n 11.59n JDK 1.3 documentationRctail 93 0.66 8.95n 10.60n Reuters news in XML formatRfc 120 0.64 8.77n 10.36n Concatenated IETF RFC filesSprot 94 0.61 8.54n 10.05nWorld 94 0.54 8.06n 9.41n CIA world fact bookAverage 0.63 8.71n 10.29n
Algorithm 3.2: VST construction with suffix links
CONSTRUCT-VST-WITH-SUFFIXLINKS(T,n)4 Initialize VST[],Q[],ISA[],M[], pTop←0, pBottom←0, curNode←root, Q[pTop]←root
...18 for (pb← pBottom down to 0) do19 if (VST[pb] is leaf node) then20 Update array M to map SA index and node VST [pb]
...26 end if27 end for28 for (pb← pBottom down to 0) do29 if (VST[pb] is leaf node) then30 VST[pb].slink←M[ISA[VST[pb].sa index+1]]31 else32 Find ancestor w of VST[pb].fchild.slink s.t.
|label(w, VST[pb].fchild.slink)|=|label(VST[pb], VST[pb].fchild)|33 Set VST[pb].slink← w34 end if35 end for
CHAPTER 3. THE VIRTUAL SUFFIX TREE 55
(a) (b)
Figure 3.5. Suffix links on the VST for the sample string T = missississippi$.
(a) suffix links on the VST, but showing the leaf nodes of the suffix tree; (b) suffix links on VST (no ST leaf nodes).
The suffix links are labeled SL1,SL2, . . .SL9, indicating the order in which they were constructed
Table 3.4. Storage requirement for the VST, including suffix linksRatio Compact Regular
Worst Case 1 15.50n 18.00nAverage Case 0.75 12.63n 14.50n
0.7 12.05n 13.80n0.65 11.48n 13.10n0.6 10.90n 12.40n
Table 3.5. Detailed attributes for nodes in the TA data structure using the sample sequence,
T = missississippi$.TA[0] TA[1] TA[2] TA[3] TA[4] TA[5] TA[6] TA[7] TA[8] TA[9]
label P0 P1 P2 P3 P4 P5 P6 P7 P8 P9
sa index 4 3 1 0 7 10 9 13 12 9fchild null TA[0] TA[1] TA[2] null null TA[5] null TA[7] TA[6]next null null TA[4] null TA[9] null TA[8] null null nullrsa index 5 5 5 5 8 11 11 14 14 14nfleaf 2 1 2 1 2 2 1 2 1 0nnleaf 0 0 1 0 0 0 0 0 0 0
CHAPTER 3. THE VIRTUAL SUFFIX TREE 56
Figure 3.6. Constructing VST from the suffix array.
Nodes are labeled based on their labels in the temporary array, TA. The mapping of the TA node labels to the
corresponding VST node labels is shown in Table 3.5.
Table 3.6. Node mapping table from TA to VSTTA nodes P0 P1 P2 P3 P4 P5 P6 P7 P8 P9
VST nodes N7 N4 N1 root N2 N8 N5 N9 N6 N3
CHAPTER 3. THE VIRTUAL SUFFIX TREE 57
Algorithm 3.3: Constructing VST From Suffix Array
CONSTRUCT-VST-FROM-SA(LCP[],SA[])1 Stack←buildStack(); TA[n]; k← 0; st.value=0; st.type=0; push(st)2 for (i← 1 to n) do3 if (LCP[i] ≥ LCP[Stack.top.index]) then4 st.value=i; st.type=0; push(st)5 else6 curLCP←LCP[i]; curNode.value←i; curNode.type←0; numleaf←07 do while(Stack is not empty & curLCP≤ LCP[Stack.top.index])8 if(Stack.top.type=0 & LCP[Stack.top.index] > curLCP) then9 if(LCP[Stack.top.index] 6= LCP[curNode.sa index]) then10 TA[k]← merge1A1(Stack.top.index,curNode)11 curNode.value← k; curNode.type← 1; k← k +112 else13 TA[curNode.value].sa index← Stack.top.index;
TA[curNode.value].nfleaf← TA[curNode.value].nfleaf+114 end if15 else if(Stack.top.type=0 & LCP[Stack.top.index]=curLCP) then16 numleaf← numleaf+117 else if(Stack.top.type=1 & LCP[Stack.top.index]>curLCP) then18 TA[k]← merge2A(TA[Stack.top.value],curNode)19 curNode.value← k; curNode.type← 1; k← k +120 else21 merge2B(TA[Stack.top.value],curNode)22 pop(Stack); break23 end if24 pop(Stack)25 end while26 exitFunction()27 end if28 end for29 root← the value of last element of Stack30 return MAP-TA-TO-VST(TA[],k,ROOT)
CHAPTER 3. THE VIRTUAL SUFFIX TREE 58
Algorithm 3.4: Mapping TA nodes to VST nodes
MAP-TA-TO-VST(TA[],k,root)1 MAP[0..k-1]; W[0..k-1]; pTop←-1; curNode← root2 TA[curNode].fchild.elength← 13 for (j← 0 to k-1) do4 MAP[j]← curNode5 computeChildren elengths(curNode)6 if (curNode.next 6= NULL) then7 curNode← curNode.next8 else9 pTop← pTop + 110 do while (TA[MAP[pTop]].fchild = NULL)11 pTop← pTop + 112 end while13 curNode← TA[MAP[pTop]].fchild14 if (TA[curNode].elength > 1) then15 TA[curNode].sa index←TA[curNode].sa index+TA[curNode].elength16 end if17 TA[MAP[pTop]].fchild← j+118 end if19 end for20 for (j← 0 to k-1) do21 W[MAP[j]]← j22 end for23 for (j← 0 to k-1) do24 if (MAP[j] ≥ 0 then)25 index← j26 SWAP1← TA[MAP[index]]27 do while(MAP[index] ≥ 0)28 SWAP2← TA[index]29 TA[index]← SWAP130 SWAP1← SWAP231 MAP[index]← -132 index←W[index]33 end while34 end if35 end for36 Remove next,rsa index,elength from TA37 Return TA as VST
Chapter 4
The Probabilistic Suffix Array
4.1 Introduction
It has earlier been shown [109] that the probabilistic suffix tree (PST) is equivalent to theprobabilistic suffix automata which is a type of probabilistic finite automata (PFA). The PFAon the other hand can be viewed as a variable length Markov model (VLMM), whose memorylength constraint is determined by the observed data. We present the probabilistic suffix array(PSA), a new data structure for representing information in variable length Markov chains. ThePSA essentially encodes information in a VLMM by providing a space-efficient representationof the probabilistic suffix tree (PST). Our PSA provides the same functionality as the PST, butat a reduced space requirement. The equivalence between the PST and a class of PFAs impliesthat our PSA is also equivalent to this class of PFAs.
Main Resultes. We present algorithms to construct the PSA and for sequence predictionbased on the constructed PSA. Our algorithms are based on the notion of empirical probabilities,modeled using information retrieval notions of term frequency (TF) and document frequency(DF). We present a linear time algorithm for computing the document frequency. We state ourmain results in the form of two theorems about the PSA.
59
CHAPTER 4. THE PROBABILISTIC SUFFIX ARRAY 60
Theorem 4.1: Given a sequence T = T [1...n], with symbols from an alphabet Σ, and thememory constraint L on the variable length Markov model, the probabilistic suffix array (PSA)for T can be constructed in O(n) time, and O(n) space, independent of the Markov order, L.
Theorem 4.2: Given a sequence T = T [1...n], with symbols from an alphabet σ, whereσ = σ1σ2...σ|Σ|, and the probabilistic suffix array (PSA) for T , we can decide on whether apattern P = P[1..m] is generated by the same variable length Markov chain that generated T inO(m log n
|Σ|) time.
In previous work by Ron et al. [109] and Apostolico and Bejerano [9], the probabilistic finiteautomata (PFA) was represented using the probabilistic suffix tree (PST). Here, we present aspace-efficient data structure to simulate such finite state machines when represented as a PST.Since a variable length Markov model is a finite state machine, and can be represented as aPST, our proposed structure can be used to capture the information in a variable length Markovmodel. We call our data structure the probabilistic suffix array (PSA), since it is built on suffixarrays rather than the suffix tree data structure. The PSA uses an array of nodes to capturethe branching structure in the suffix tree, and other auxiliary arrays to maintain informationneeded for learning from the observed data. Learning in the PSA is performed by computingconditional probabilities at each node in the PSA using empirical probabilities computed viathe T F and DF .
Organization. In the next section, we briefly describe the PST using an example sequence.In Section 3, we present our PSA data structure. We also give an example for the PSA toexplain how it works. The construction algorithm is presented in Section 4. We analyze itspractical space requirement in Section 5. Section 6 presents experimental results on proteinfamily classification and phylogenetic tree construction. The last section provides a summaryand conclusion.
4.2 Probabilistic Suffix Tree
Given a sequence T with n observations, the Markov model needed to represent T willrequire space that is exponential in L, the order of the Markov model. In practice, the transition
CHAPTER 4. THE PROBABILISTIC SUFFIX ARRAY 61
matrix for the Markov model to represent such a sequence will be sparse as the order L increases.Suffix tree data structures represent all substrings of a sequence seq in O(n) internal nodes andleaf nodes. Ron et al. [109] presented a space-efficient data structure called the probabilisticsuffix tree (PST) to represent the order L transition matrix. The PST encodes only the non-zero transition probabilities. Therefore, the PST is a suffix tree which contains the transitionprobabilities for the Markov model with any order L, where 0 < L ≤ n. However, in practice,the space requirement for the suffix tree is still a problem.
Consider the sample sequence T = accactact$. Its first order transition matrix and thecorresponding state diagram are shown in Figure 4.1. Figure 4.2 shows an example suffix treeand the corresponding PST, for this example sequence. The PST is shown for the case of orderL = 3. In this PST, we label the transition probability in each symbol. The transition probabilityfor a given symbol is calculated using the conditioning context.
a c
tg
0.25
1
0.25
0.5
1
(a) State Diagram (b) Transition Matrix
Figure 4.1. State diagram and transition matrix for a first order Markov model for an example
sequence T = accactact$.
4.3 Proposed Data Structure
We propose the probabilistic suffix array (PSA) as a way to simulate the probabilistic suffixtree. Each node in the PSA has a corresponding node in the PST. The basic PSA structure hasthree types of attributes. The first category of attributes are the foundation attributes, whichconsists of the original text and its suffix array. Construction of the suffix array is in O(n) time
CHAPTER 4. THE PROBABILISTIC SUFFIX ARRAY 62
(a) (b)
Figure 4.2. Example suffix tree and probabilistic suffix tree for the string T =accactact$.
(a) suffix tree ; (b) probabilistic suffix tree. The trees are shown without the suffix links. The numbers at each nodeof the PST are based on the count of the number of times the symbols are observed after observing the sequencecorresponding to the node label. Essentially, at a given node, say u, these encode the conditional probabilitiesP(σ|C) of observing the symbol σ following the sequence L(u).
and space, using any of the various linear-time linear-space algorithms. See for example [4, 57,97, 104]. The second category of attributes are the internal node attributes. These are derivedfrom the interval array, which is determined following [131]. These are used to represent theinternal nodes in the suffix tree including the suffix links. The suffix link is the link from aninternal node to its suffix node. The third type of attributes are measurement attributes. Theserecord measurement information, such as term frequency, document frequency, and conditionalprobabilities, which are needed to compute probabilities in the Markov model. Table 4.1 showsthe three categories of attributes used in the PSA. We use the term length of PSA to refer to thenumber of nodes in the PSA. The term length emphasizes the fact that our PSA nodes are storedas arrays. In this work, we use M to denote the PSA length.
4.3.1 Internal Node Attributes
The internal node attributes are derived from the interval array. The pair < Start,End >
represents the interval position of this node in the suffix array. Length denotes that length
CHAPTER 4. THE PROBABILISTIC SUFFIX ARRAY 63
of longest common prefix between the substrings represented by the node. This essentiallycorresponds to the length of the path from root to the current node. The PSA internal nodeattributes are used to simulate the internal nodes in a suffix tree. The attribute Suffix Link is aregular suffix link from the current internal node to its suffix node. We use this link to continuethe searching process when a mismatch occurs at the time of prediction. The internal nodeattributes including the suffix link are constructed using Algorithm BUILDPSA.
4.3.2 Measurement Attributes
These attributes are dependent on the application. For example, for applications in docu-ment feature selection, or in protein sequence classification, we only compute and store T F andDF , and the conditional probabilities. In other applications, such as document clustering, wemay need to compute document frequency for the classes, rather than just the document fre-quency. The attributes are also dependent on the methods used in calculating the probabilitiesin the Markov model. Thus, we focus on the method of computing the conditional probabilities,and the probability of a node in the VLMM. For a given node with node label, say (S1...Sn), itsprobability, PV LMM is given by:
PV LMM = P(S1...Sn) = P(Sn|S1...Sn−1)...P(S2|S1)P(S1) (4.1)
If S = S1...St−1St (where 1≤ t ≤ n ) does not occur in the training data, we find the longestsuffix of S which occurred in the training data. Assume the Sk...St−1St (where 1≤ k ≤ t) is the
Table 4.1. Attributes of PSAType Attribute SpaceFundamental Original Text(text) charAttributes Suffix Array(SA) integerInternal Node Start integerAttributes End integer
Length integerSuffix Link integer
Measurement Term Frequency(TF) integerAttributes Document Frequency(DF) integer
cProbability P(St |Sk...St−1)(CP) float
CHAPTER 4. THE PROBABILISTIC SUFFIX ARRAY 64
longest suffix of S, then P(St |S1...St−1) = P(St |Sk...St−1). Thus,
P(St |S1...St−1) = P(St |Sk...St−1) =
T FSt
n : k = t
T FSk ...StT FSk ...St−1
: k < t(4.2)
Here, T Fu is the term frequency of the node with node label u. We make two observations:First, if the terminal symbol St of a path Sk...St−1St is a first symbol in an edge of the suffixtree, then the conditional probability P(St |Sk...St−1) can be computed as the frequency of thecurrent node divided by the frequency of the parent node. Secondly, if the terminal symbolSt of a path Sk...St−1St is a non-first symbol in an edge of the suffix tree, then the conditionalprobability P(St |Sk...St−1) is 1. We call this a trivial conditional probability. Thus, we need tostore the conditional probabilities for only the first symbols in each edge. When we determinethat the terminal symbol of a path is a non-first symbol in an edge, we simply return 1 for theconditional probability of the symbol.
4.3.3 Example PSA
Table 4.2 and Table 4.3 show the nature of the PSA nodes and the order-3 conditionalprobabilities in a PSA, using the sequence T = accactact$ used in Figures 4.2 and 4.1. Theentries in Table 4.2 are directly calculated from the interval array. We notice that P(c|a) is1. This indicates the substring “ac” is represented on one edge and that the terminal symbol”c” is the second symbol on this edge. This is an example of a trivial conditional probability.P(c|ta) also is an example whose probability is 1.
Entries in Table 4.3 are computed from the PSA leaf nodes. We only showed the non-trivial conditional probabilities on the leaf nodes. These conditional probabilities are calculatedbased on the first observation described in Section 4.3.2. The numerator is 1 since this is a leafnode which must have a frequency of 1. The denominator is the frequency of the substringcorresponding to the node label of the parent of the current leaf node. This is easily obtainedas (End−Start +1) using the elements in the interval array. The PSA nodes can be comparedwith nodes in the example PST shown in Figure 4.2. The transition matrix in Figure 4.1(b) canbe constructed in the PSA.
CHAPTER 4. THE PROBABILISTIC SUFFIX ARRAY 65
PSA node Interval Length Suffix Probability ConditionalIndex Link Expression Probability
1 < 2,3 > 3 2 P(t|ac) 23
2 < 1,3 > 2 3 P(a) 13
2 < 1,3 > 2 3 P(c|a) 13 < 6,7 > 2 4 P(t|c) 1
2
4 < 4,7 > 1 -1 P(c) 49
5 < 8,9 > 1 -1 P(t) 29
Table 4.2. Example PSA internal nodes, using the PSA of the sequence T = accactact$
SA index Probability ConditionalExpression Probability
1 P(c|ac) 13
3 P(a|act) 14 P(a|c) 1
4
5 P(c|c) 14
7 P(a|ct) 12
9 P(a|t) 1
Table 4.3. Example PSA leaf nodes, using the PSA of the sequence T = accactact$
4.3.4 Interval Array and Document Frequency in Linear Time
The interval nodes represent the basic structure of the PSA. Since other attributes are basedon the structure of these interval nodes, we will need to compute these interval nodes first. Theterm frequency (T F), and document frequency(DF) are computed as by products, as we buildthe interval nodes. In this section, we give a new algorithm to compute DF in linear time.Compared with the original algorithm [131] which runs in O(n logn) time, our algorithm ismore efficient and applicable to large document collections. The interval nodes are stored as anarray using the interval array data structure. To build the interval array, we use Yamamoto andChurch’s T F algorithm [131]. In our structure, the non-trivial lcp-delimited intervals representthe internal nodes of the suffix tree. Therefore, we modify the algorithm to output only the non-trivial lcp-delimited intervals. After this procedure, we obtain the interval array which includesinterval attributes < Start,End > and Length for each node. Length is simply the LCP of theinterval represented by < Start,End >. The attribute T F is also computed at this stage.
CHAPTER 4. THE PROBABILISTIC SUFFIX ARRAY 66
For applications that involve multiple sequences or documents, for instance, in text clus-tering or in protein sequence classification, we may require the document frequency. In thiswork, the input sequence in such applications will be a concatenation of all the sequences, witha special end of document symbol ($) delimiting each individual sequence.
The original algorithm proposed in [131] for computing document frequency runs on O(n logn)time. Here, we modify the algorithm to improve its running time to O(n). The original algo-rithm was reproduced in Figure2.1 for easy reference.
In line 4-6, the algorithm searches for the largest x. The worst case for this search will beO(logn). We add a new array docsp that maps the document id to a stack. The length of thisarray is the number of documents, Z. When we calculate a new interval < Start,End >, the al-gorithm will check whether the element has been observed previously. The algorithm searchesdocsp with the document id of the new element. If the document id is found, the documentfrequency (DF) is changed. The process performs a simple look-up using docsp in constanttime, and hence the modified algorithm runs in O(n) time. Algorithm COMPUTEDOCUMENT-FREQUENCY implements the proposed modifications.
Algorithm 4.1: Computing Document Frequency in Linear Time
COMPUTEDOCUMENTFREQUENCY
(3) doc← getdocnum(s[j]), docsp[doc]← sp(4) if doclink[doc] 6= -1, do(5) if docsp[doc] >sp or stack[docsp[doc]].i > doclink[doc] do(5.5) stack df[docsp[doc]]← stack df[docsp[doc]]-1(6) doclink[doc]← j, docsp[doc]← sp
4.4 Constructing the PSA
Having described the building blocks for the probabilistic suffix array (PSA), we are nowready to describe how we put them together to construct the PSA. Algorithm BUILDPSA (Al-gorithm 4.2) uses five major steps or procedures in constructing the PSA data structure for agiven input sequence. The first step is the construction of the suffix array from the originalsequence. We use standard linear-time linear-space algorithms for this step. Using Yamamoto
CHAPTER 4. THE PROBABILISTIC SUFFIX ARRAY 67
and Church’s PRINT LDIS STACK() function [131], we construct the interval array. We thensimulate the tree-like structure from the interval array in the third step. The third step maps eachposition in the input sequence to its interval. In the forth step, the routine BUILDSUFFIXLINK
starts by first constructing the inverse suffix array W . From W , the algorithm determines alink that allows the interval array to point to the next position. Thus, the suffix link is easy toconstruct.
The final procedure constructs a ranked list of the elements in the interval array (the nodesin the PSA) in a non-decreasing order. This process uses the counting sort in O(M) time andusing 2M integer space, where M is the number of non-trivial lcp-delimited intervals. SinceM ≤ n, this will take O(n) time.
Algorithm 4.2: Building the Probabilistic Suffix Array
BUILDPSA(Text)1 SA← BUILDSA(Text)2 IntervalArray← PRINT LDIS STACK(SA)3 <posInterval,parent>← BUILDINTERVALTREE(SA,IntervalArray)4 PSA← BUILDSUFFIXLINK(posInterval,parent)5 PSA← SORTPSA(PSA)
4.4.1 Building the Interval Tree
Algorithm BUILDINTERVALTREE (Algorithm 4.3) uses the interval pairs < Start,End >
to construct a tree-like structure that encodes the parent-child relationships between the inter-vals. The main idea is to set the interval ID into a position between positions Start and End,such that the position has not been previously set. The problem is that the pairs < Start,End >
can overlap (nesting). A naıve algorithm for this task will require an O(n2) time.
We use a stack to store the free position which is the current pair < Start,End >. If theposition has not been earlier set to some interval ID, the algorithm sets the interval ID one byone from Start to End. If the position has earlier been set to some interval ID, the algorithmwill set the interval ID in the last part of the pair < Start,End > which has not earlier been set
CHAPTER 4. THE PROBABILISTIC SUFFIX ARRAY 68
to an interval ID. This process guarantees that the position will be set to one interval ID, andthat the position is included in only one < Start,End > pair.
Some positions which could not be set to an interval ID in the last pass will be set to aninterval ID in the next loop. The following step pops positions from the stack and sets intervalIDs into these positions. This process guarantees that every position will be set to one intervalID, even the positions that belong to the next higher level interval, or to a lower level interval.
After determining the interval IDs, the algorithm computes the parent array. The parentarray stores the link from one interval to the higher interval position which shares the samevalue for Start. The variable pN is a stack. This stack stores the intervals which have beencomputed, but the parent’s interval ID has not yet been set. Line 16 pushes the interval into thestack.
When the current interval includes the interval at the top of the stack pN, it means thecurrent interval is the parent of the interval at the top of pN. Thus, we set the value for parentof the interval at the top of pN to the current interval. The loop in lines 13 to 15 repeat the stepto find all the child intervals of the current interval.
4.4.2 Building the Suffix Link
Algorithm BUILDSUFFIXLINK (Algorithm 4.4) uses the suffix array (SA) to compute theinverse suffix array W . To compute the suffix link, the algorithm sets the position of the parentin W to be the current position. Thus we obtain the suffix link for each leaf node. To computethe suffix link of an internal node, we simply use the parent array to find the suffix link of theinternal node.
4.4.3 Sorting the PSA Structure
After obtaining the suffix link, we resort the PSA structure to make it suitable for efficientsearching during the VLMM prediction stage. The prediction procedure performs frequentsearches using the interval array. (See Section 4.4.5). After sorting using the < Start,Length >
attributes of the PSA, searching will be done in O(logM) time, where M is the PSA length, the
CHAPTER 4. THE PROBABILISTIC SUFFIX ARRAY 69
Algorithm 4.3: Buliding the Interval Tree
BUILDINTERVALTREE(SA[],Start[],End[],Length[])1 posInterval[]← -1; pt← 02 for (i← 1 to M) do3 if posInterval[i]=-1 do //first time to observe Start[i]4 for (j← Start[i] to End[i]) do5 if posInterval[j]=-1 do posInterval[j]← i else break end if6 end for7 for (j← End[i] down to Start[i]) do8 if posInterval[j]=-1 do posInterval[j]← i else break end if9 end for10 end if11 pop position pos which is between Start[i] and End[i] in Stack12 posInterval[pos]←i; push the position between Start[i] and End[i-1]13 while pt > 0 and Start[i] ≤ Start[pN[pt-1]]14 parent[pN[pt-1]]← i; pt← pt-115 end while16 pN[pt]← i; pt← pt+117 end for18 return < posInterval[], parent[] >
Algorithm 4.4: Buliding Suffix Link
BUILDSUFFIXLINK(SA[],Start[],Length[], posInterval[], parent[])1 for (i← 1 to n) do W[SA[i]-1]← i end for2 for (i← 1 to n) do W[i]← posInterval[W[i]] end for3 for (i← 1 to M) do4 if Length[i] 6= 1 and sLink = NULL do5 sLink[i]←W[SA[Start[i]]]6 k← parent[i]; pi← i7 do while (k 6= root and sLink[k] = NULL)8 sLink[k]← parent[sLink[pi]]9 do while (Length[sLink[k]] 6= Length[k]-1)10 sLink[k]← parent[sLink[k]]11 end while12 pi← k; k← parent[k]13 end while14 end if15 end for
CHAPTER 4. THE PROBABILISTIC SUFFIX ARRAY 70
number of the interval pairs in the PSA.
The algorithm SORTPSA (Algorithm 4.5) uses counting sort to sort the PSA structure. Inthe interval array, the attribute Length is already in decreasing order for intervals that sharethe same Start. Therefore, we only need to sort the structure based on the order of the Startattribute. The time complexity of this algorithm is linear with respect to the number of nodes(number of intervals). The additional space is at most 2n integers, see analysis in Section 4.5.
Algorithm 4.5: Sorting the PSA Structure
SORTPSA(Start[],Length[],sLink[])1 Count[]← 0; Order[]← 0; wOrder[]← 0; NewStart[]← 0; NewEnd[]← 0
NewLength[]← 0; NewsLink[]← 0; sum← 0; NewPattern[]← 02 for (i← 1 to M) do Count[Start[i]]← Count[Start[i]]+1 end for3 for (i← 1 to n) do4 if Count[i] 6=0 do sum← sum+Count[i]; Count[i]← sum-Count[i]; end if5 end for6 for (i← 1 to M) do7 Order[Count[Start[i]]]←i; Count[Start[i]]← Count[Start[i]] + 18 end for9 for (i← 1 to M) do wOrder[Order[i]← i end for10 for (i← 1 to M) do sLink[i]← wOrder[sLink[i]] end for11 for (i← 1 to M) do12 NewStart[i]← Start[Order[i]];NewEnd[i]← End[Order[i]];NewLength[i]← Length[Order[i]]13 NewsLink[i]← sLink[Order[i]];NewsPattern[i]← pattern[Order[i]]14 end for
4.4.4 Computing Conditional Probabilities Using the PSA
Algorithm COMPUTEPROBABILITY (Algorithm 4.6) is a simple routine which is basedon the PSA structure. It computes the conditional probability at each internal node by usingequation (4.2). The algorithm uses the temporary array parent which was generated by algo-rithm BUILDSUFFIXLINK (Algorithm 4.4). The array parent contains a record of the parentfor each given interval node. The algorithm is linear in time and is in place. After computingthe conditional probabilities, the space used by array pattern can be released, since the array isno longer needed.
CHAPTER 4. THE PROBABILISTIC SUFFIX ARRAY 71
We summarize the above results in Theorem 4.1, our first main result on the PSA:
Theorem 4.1: Given a sequence T = T [1...n], with symbols from an alphabet Σ, and thememory constraint L on the variable length Markov model, the probabilistic suffix array (PSA)for T can be constructed in O(n) time, and O(n) space, independent of the Markov order, L.
Proof: Algorithm BUILDSA and PRINT LDIS STACK each runs in O(n) time complexityas in [104] and [131] respectively. Algorithm BUILDINTERVALTREE computes two arraysposInterval and parent. The array posInterval stores the relationship between leaf nodes andinterval nodes, while parent represents the parent nodes for the interval nodes. There are nleaf nodes and M interval nodes. Lines 2-17 in algorithm BUILDINTERVALTREE searches eachinterval node. Lines 4-6 makes the relationships between current interval node and the leafnodes before the first child interval node of the current interval node. Lines 7-9 calculates therelationships between current interval node and leaf nodes after the last child interval node ofcurrent interval node. Line 11 computes the relationships between current interval node andleaf nodes which is not included in other interval nodes. Each leaf node will be computed onetime in algorithm BUILDINTERVALTREE. Lines 13-15 compute the child nodes of the currentinterval node. The time cost is O(M) over the whole algorithm, even with the loop in Line2. Algorithm BUILDSUFFIXLINK calculates the suffix link via the inverse suffix array W . Itcalculates suffix links starting at the lowest level interval nodes which does not include otherinterval nodes. Then it calculates the suffix link for the parent node of current interval nodefollowing the parent array. So the time is O(M) for computing the suffix links for all intervalnodes. There are 6 loops in algorithm SORTPSA. Each loop runs either n times or M times.So the time complexity is O(n). Therefore, overall, the probabilistic suffix array (PSA) for asequence T of length n can be constructed in O(n) time.
Linear space requirement follows from the space analysis in Section 4.5. �
4.4.5 Prediction with VLMM via the PSA
An important procedure in Markov models is to compute the probability that a given testpattern is generated by the model. For the variable length Markov model (VLMM), we denote
CHAPTER 4. THE PROBABILISTIC SUFFIX ARRAY 72
Algorithm 4.6: Calculating Conditional Probability
COMPUTEPROBABILITY(PSA,M,n)1 for (i← 1 to M) do2 if PSA.Length[i]=1 do3 PSA.cProbability[i]← PSA.TF[i]/n4 else5 PSA.cProbability[i]← PSA.TF[i]/PSA.TF[PSA.parent[i]]6 end if7 end for
this probability as the PV LMM of the input pattern. Algorithm VLMM-PREDICTION (Algorithm4.7) calculates the PV LMM of a test sequence. This algorithm uses Equation (4.1) to computeP(S1),P(S2|S1)... by searching for the sub-patterns S1,S1S2,... When there is a mismatch whilesearching with the sub-pattern Sk...St , the algorithm jumps to the node pointed to by the suffixlink attribute of the current node. Thus, matching proceeds from the node representing the suffixSk+1...St , after accounting for the prefix of this suffix, which has already been matched in theprevious step.
The algorithm scans positions in the pattern from left to right. While matching the sub-pattern, it uses the function LeftmostMatchedPosition which uses standard suffix ar-ray search algorithms [3, 84] to find the left most position (le f tPosn) that matched the sub-pattern. The function LeftmostMatchedPosition searches the pattern Pattern[s...i]#,where # is a symbol that never occurred in the alphabet. That is, # /∈ Σ,# < σ,∀σ ∈ Σ, and$ < #. The search will result in a mismatch. Since Pattern[s...i] has already matched up toposition i−1 using the SA, we can easy determine the leftmost position of Pattern[s...i] in thesuffix array. We use le f tPosn to denote this leftmost position in the algorithm. The functionSearchPSA uses the determined leftmost position to search in the PSA. This function alsouses standard suffix array search algorithms [3, 84]. The function determines the index of thePSA node such that PSA.index.Start = le f tPosn and PSA.index.Length is the minimum valuethat is larger than i− s, the length of matching prefix of the pattern. Since the PSA is aleadysorted by Start and Length, the search for the index will be done in O(logM) time complexity,where M is the length of PSA (i.e. the number of nodes in the PSA).
When a mismatch occurs, the algorithm uses the previously computed index to determine
CHAPTER 4. THE PROBABILISTIC SUFFIX ARRAY 73
the suffix link. Thus the search will be redirected to the new branch following the suffix link.The algorithm now uses the longest suffix of the sub-pattern that so far matched as the newsub-pattern, and re-starts matching from the symbol that mismatched. Determining this pointwhere matching should re-start is a constant time operation.
The foregoing implies that, given a sequence T = T [1...n], with symbols from an alphabetΣ, and the PSA for T , we can decide on whether a pattern P = P[1...m] is generated by the samevariable length Markov chain that generated T in O(m log n
|Σ|) time.
We can use the predicted probability above to perform protein sequence classification. Sup-pose we have F protein families and we have computed the PSA for each family. Let PSAk
be the model constructed using the k-th protein family. Further, let PV LMM(P,PSAk) be theprobability that protein sequence P is generated by PSAk, as returned by Algorithm VLMM-PREDICTION. Then, we classify P to protein family f , where f is given by:
f = argmaxk=1...F
{PV LMM(P,PSAk)}. (4.3)
We summarize the foregoing in the following Theorem, the second major contribution of thiswork on the PSA:
Theorem 4.2: Given a sequence T = T [1...n], with symbols from an alphabet σ, whereσ = σ1σ2...σ|Σ|, and the probabilistic suffix array (PSA) for T , we can decide on whether apattern P = P[1..m] is generated by the same variable length Markov chain that generated T inO(m log n
|Σ|) time.
Proof: We observe that whenever a mismatch occurs, the start position of the sub-patterns will be increased by one. The number of mismatches is at most m. Similarly, the number ofmatching positions will equally be at most m. Thus, the loop in Lines 2-17 will run at most 2mtimes. Line 12 calls the function LeftmostMatchedPosition which uses standard suffixarray search algorithms [3, 84], that run in O(logn) time per call. The function SearchPSAin Line 13 searches the PSA data structure with the parameters determined in the previous stepin O(logM) time, where M is the length of the PSA. Thus, the total running time will be inO(m logn). We improve this time to O(m log n
|Σ|) using an extra |Σ| space to record the starting
CHAPTER 4. THE PROBABILISTIC SUFFIX ARRAY 74
position of each symbol in the suffix array. �
Algorithm 4.7: Prediction with VLMM via the PSA
VLMM-PREDICTION(Pattern,PSA)1 s← 1, Prob← 1,L← 1,R← n, index← 0,m←‖Pattern‖2 for (i← 1 to m) do3 Search Pattern[s...i] in PSA.SA with parameter L,R4 if mismatch do5 index← PSA.index.sLink6 L← PSA.index.Start, R← PSA.index.End7 s← s+18 if i≤ s−1 do9 i← i+110 end if11 else12 position← LeftmostMatchedPosition(Pattern[s...i],PSA.SA,L,R)13 index← SearchPSA(PSA,position, i− s+1)14 Prob← Prob × PSA.index.Conditional Probability15 L←PSA.index.Start, R← PSA.index.End16 end if17 end for18 return Prob
4.5 Space Analysis
In this section, we analyze the space requirement for the PSA structure. We consider spacerequired during its construction (work space), and for its storage and use.
4.5.1 Storage Space
The basic PSA structure has four types of attributes. The measurement attributes are de-pendent on the application, and may not be needed every time the PSA is used. We also observethat, though the T F is needed for computing the empirical probabilities required for the laterstage of determining the conditional probabilities, we do not need to store the T Fs directly.
CHAPTER 4. THE PROBABILISTIC SUFFIX ARRAY 75
They can be obtained easily using the < Start,End > pairs stored at each node. Thus, we focuson the other attributes. From Table 4.1, we see that the worst case space for the PSA structureis 6n integers plus n characters, or 25n bytes, assuming n≤ 232, and |Σ|= 256. On average, wecould save at least n integers by storing the < Start,End > pair as < Start,(End− Start) >,and the fact that the maximum value in the Length array will be the maximum LCP value forthe sequence, which is known to be of length in O(log|Σ| n) [58].
Further, with M = PSA length (number of PSA nodes or intervals), the space for the internalnode attributes and suffix link attributes will be 4M integers. In this case, M is a sub-linearfunction of n. Thus, the ratio γ = M
n , where 0 ≤ γ ≤ 1 is an important measure on the nodebranching structure of the PSA, and hence the complexity of the original sequence. With largerγ, we need more practical space to store the PSA. At γ < 0.7, which is our observation forexample sequences tested (See Table 3.3), this will result in an average space requirement of(5n+12n∗ γ), or 13.4n bytes for the PSA.
4.5.2 Construction Space
There are five steps to build the PSA and compute the VLMM probabilities. We list thework space in each step and analyze the worst case space over all the construction steps.
1. Building Interval Array and Computing T F , DF
The space requirement will depend on the following:
(a) Input: Text of size n, suffix array (n integers) and LCP Array (n integers).
(b) Work space (in integers): Stack(2n), DF (n), doclink(Z) and docsp(Z), where Z isthe number of documents.
(c) Output: interval array with < Start,End > and Length, n integers each.
The total space will be (7n+2Z) integers plus n characters for the text.
Memory reuse: The interval < Start,End > and Stack can share the same space. Stackis the space which stores the yet-to-be-computed non-trivial lcp-delimited intervals. Inthe extreme case, the length of Stack is at most n, since there are at most n non-trivial
CHAPTER 4. THE PROBABILISTIC SUFFIX ARRAY 76
lcp-delimited intervals. So when a non-trivial lcp-delimited interval has been processed,it will be stored in the interval < Start,End >. The length of the interval < Start,End >
is at most n. Thus, the interval < Start,End > records the calculated non-trivial lcp-delimited intervals. However, the sum of length of the interval < Start,End > and Stackis at most n. Thus, the interval < Start,End > and Stack can share the same space. Themaximum space required will then be (5n+2Z) integers plus n characters.
2. Building the Interval Tree
The space requirement will depend on the following:
(a) Input: Text of size n, suffix array, interval array < Start,End > and Length.
(b) Work space and output (in integers): posInterval (n), parent (n), pN (n) and Stack(n).
The total space needed for this step is 8n integers plus n characters.
Memory reuse: The arrays Stack and pN can share the same n integer space, since pNis a stack that indicates the nodes have been processed, while Stack is the space thatindicates those not yet processed. These are mutually exclusive, with a combined lengthn. The maximum space will be 7n integer plus n char.
3. Calculating Suffix Links
The space requirement will depend on the following:
(a) Input : Text of size n; 1n integer array each for, SA, Start, Length, posInterval, andparent.
(b) Work space and output (in integers): inverse suffix array W (n) and suffix link(n).
The total space for this step is 7n integer plus n characters.
Memory reuse: Suffix link and posInterval share the same n integer space. In the algo-rithm BUILDSUFFIXLINK, after line 7, the array posInterval is never used again, and thespace could be reused for the suffix link. The maximum space is thus 6n integer plus ncharacters.
CHAPTER 4. THE PROBABILISTIC SUFFIX ARRAY 77
4. Sort PSA
The space requirement will depend on the following:
(a) Input: Text of size n; 1n integer array each for Start, Length, and Su f f ixLink.
(b) Work space and output: 8 counting arrays (n integer each), namely: Count, Order,wOrder, NewStart, NewEnd, NewLength, NewsLink, NewPattern.
Memory reuse: At most two additional arrays are need at any given time. Thus, themaximum work space is 5n integers plus n characters.
5. Computing the Conditional Probabilities
This is an in place algorithm, requiring no extra space.
From above analysis, the maximum space required during PSA construction will be 7nintegers plus n characters, where we have assumed that n� Z. We can use an argument similarto that made for the storage space to save n integers, and applying the γ ratio to get an averagecase construction space requirement of 5.15n integers, or 20.6n bytes.
The above PSA space requirement can be compared with the space requirement using thePST. The use of reverse suffix links [9] imply that the PST will require at least 37n byteson average (assuming Ukkonen’s suffix tree construction algorithm), without counting otherauxiliary structures needed for the PST construction.
4.6 Experiments
We performed experiments on protein sequences to test the proposed data structure. Theexperiments were performed using a DELL PC, with 4 × 2.67GHz CPU, and 8G memory,running Ubuntu 10.10 Linux operating system. All programs were compiled using gcc.
4.6.1 Predicting Protein Families
Our major objective was to develop a time- and space-efficient alternative to the PST. How-ever, to place our results in the correct context, we must first verify that the proposed PSA pro-
CHAPTER 4. THE PROBABILISTIC SUFFIX ARRAY 78
duces an equivalent performance in protein sequence modeling and prediction, when comparedwith the original PST. Thus, to be able to compare the PSA results with previous approachesusing the PST [17, 71], we used the same protein sequence dataset that was used in [17] andin [71]. We downloaded the Pfam databse [15, 39], release 1.0. The database contains 175families originally derived from the SWISSPROT 33. We use family members from the un-aligned sequences to generate the PSA, one PSA for each family. To test the performance inmodeling the protein sequences and in predicting the family for unknown sequences, we usedleave-one-out cross validation. For each sequence in the SWISSPROT 33 database, we com-pute the PV LMM using the PSA structure for each family. We then assign the protein sequenceto the family with the maximum probability. When the maximum probability is obtained usingthe model of the correct family (whose PSA is generated without the test sequence), we say wehave a correct classification (true positive), otherwise, there is a classification error. This simpleapproach avoids the difficult problem of setting thresholds for correct classification.
Table 4.4 shows the classification performance using the PSA. For comparison, we havealso included the results obtained using the PST on the same dataset, as reported in [17]. Table4.5 shows the summary classification performance. As expected, both the PST and the PSAproduce comparable results with respect to modeling and classification of protein families. ThePST had an average true positive rate of 90.8%, while the PSA had 90.2%. We note the signif-icant differences in the family sizes for the PST and PSA results. Although we used the samegeneral Pfam dataset, there has been various additions to the Pfam database since the originalpublication of the PST results. On average, the size of the families in our dataset was 128.21,while the size used for the PST was 97.82. The total size of the current dataset used for thePSA was more than 1500 sequences larger than that of PST (6539 versus 4891). As can be seenfrom Table 4.4, most of the families where the PST performed significantly better than the PSAcould be due to this difference in family sizes (see for example, ank, C2, efhand).
One advantage of performing classification using Eqn. (4.3), is that, when there is an errorin the classification, we can consider the family with the next highest predicted probability.For instance, we can consider the the top-k classification rate, which shows the probability offinding the correct protein family within the first k families, as ordered based on their generatedprobabilities, using the test sequence. Fig. 4.3 shows the classification performance of thePSA on some sample families, using the top-k classification rate. We can observe how the
CHAPTER 4. THE PROBABILISTIC SUFFIX ARRAY 79
classification performance rapidly approaches 100% after the first few k values.
0 2 4 6 8 10
020
4060
8010
0
Top k
True
Pos
itive
Per
cent
age
●
●
●● ● ● ● ● ● ●
●
7tm_1adh_shortcytochrome_b_Cefhand
Figure 4.3. Top-k classification rate for sample protein families using the PSA.
4.6.2 Space Consideration
A major problem with suffix trees is their practical memory space requirement. Althoughthey have the same theoretical linear space requirement as suffix arrays, in practice, suffix treesconsume much more space [3, 46]. This was our primary motivation for developing the PSA.Table 4.6 shows the summary data on the protein families in Pfam used in our experiments.Table 4.7 compares the memory space required to construct the PST [17] and SPST [71]) withthat required for the PSA.
We have included results for PST-20, PST-FULL, and SPST [71]. PST-FULL correspondsto the complete PST with no pruning, i.e. with the full string depth for each leaf node. PST-20corresponds to orde-20 PST, i.e. PST with a maximum string depth of 20 symbols. This wasthe variant used in [17]. The SPST proposed in [71] also involved some pruning of the suffixes.First, we can observe the nature of the protein sequences (Table 4.6). The maximum branchingfactor (γ = M
N ) observed was 0.75, while the minimum was 0.46. The average was 0.62. Asa key performance measure, we used the memory consumption factor (MC Factor), defined asthe ratio of the required memory to the total sequence length (N) of the family. We compared
CHAPTER 4. THE PROBABILISTIC SUFFIX ARRAY 80
the MC Factor for PSA, PST, and SPST. The table shows that the PSA ratio was steady atabout 33.16N bytes. The maximum memory required by the PSA for any of the families was53.46N bytes. This can be compared with the (mean and maximum) memory needed for PST-20 (41.67N,222.12N), PST-FULL (167.47N,1216N) and SPST (67.17N,111.53N). Fig. 4.4shows more detailed information on the memory consumption needed to construct the datastructures, using the Pfam protein families. Perhaps, more significantly, while the PSA memoryis relatively constant independent of the sequence or family, we can observe the huge fluctuationin the memory needed for the PST and SPST, as captured by the range and standard deviationon the memory consumption factor.
0 20 40 60 80 100 120 140
020
040
060
080
010
0012
00
File Size (K)
MC
Fac
tor
●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●● ●● ●● ●●● ●
● PSAPST−20PST−FULLSPST
Figure 4.4. Memory consumption factor (MC Factor) needed to construct the PSA and PST
data structures for the first 51 protein families in Pfam.
4.6.3 Computational Time Requirement
Table 4.8 shows the summary of the time required for constructing the PSA and the PSTdata structures. Table 4.9 shows the corresponding summary of the time needed for predictionusing the models. The tables show that for prediction, on average, the PSA is about 2.5 timesfaster than PST-20. The PSA was much faster at the construction stage. For instance, while thePSA was about 3 times faster to build than PST-20, and about 250 times faster than constructing
CHAPTER 4. THE PROBABILISTIC SUFFIX ARRAY 81
PST-FULL. The speedup could be related to the fact that the PSA requires a much smallerconstruction space, and thus less time is spent on moving data in memory.
4.6.4 PSA in Phylogenetic Tree Construction
Ron et al [109] and Bejerano and Yona [17] have enumerated various applications of thePST. As earlier indicated, the PSA is simply a more efficient alternative to the PST. Thus,the PSA can be used anywhere the PST is used. To study this versatility of the PSA/PSTfurther, we considered a new application of the PSA – specifically, its use in the problem ofphylogenetic tree construction, using mtRNA or mtDNA sequences. We used the mtDNA se-quences from 20 species, namely, human (Homo sapiens, V00662), common chimpanzee (Pantroglodytes, D38116), pigmy chimpanzee (Pan paniscus, D38113), gorilla (Gorilla gorilla,D38114), orangutan (Pongo pygmaeus, D38115), gibbon (Hylobates lar, X99256),baboon (Pa-pio hamadryas, Y18001), horse (Equus caballus, X79547), white rhinoceros (Ceratotherium si-mum, Y07726), harbor seal (Phoca vitulina, X63726), gray seal (Halichoerus grypus, X72004),cat (Felis catus, U20753), fin whale (Balenoptera physalus, X61145), blue whale (Balenopter-amusculus, X72204), cow (Bos taurus, V00654), rat (Rattusnorvegicus, X14848), mouse (Musmusculus, V00711), opossum (Didelphis virginiana, Z29573), wallaroo (Macropusrobustus,Y10524) and platypus (Ornithorhyncus anatinus, X83427). This is the same dataset previouslyused in constructing phylogenetic trees by Otu et al [99] and Li et al [73]. This is a challengingdataset, and there has been some debate on the position of some of the species [24, 105].
To construct the phylogenetic tree, we use the PSA to compute a dissimilarity measurebetween every pair of species in the dataset. First we construct the PSA for the mtDNA sequencefor each species. Then for a given species, we compute the PV LMM, the probability that the givenspecies is generated by the PSA constructed from each of the other species. After getting theprobabilities, we use the quantity λ(A,B) = − logPV LMM(A,PSAB) as the dissimilarity measurebetween the sequences from two species A and B, where PV LMM(A,PSAB) is the predictedprobability that sequence A is generated by the model represented by the PSA of sequence B.We then use the measurements λ(A,B) for all pairs of species to construct the phylogenetic tree.
Fig. 4.5 shows the constructed phylogenetic tree using the PSA. The results are generally inagreement with earlier work on this dataset (see [73,99] for example). The only major difference
CHAPTER 4. THE PROBABILISTIC SUFFIX ARRAY 82
is in the placement of cow, which is supposed to be closer to bluewhale and finwhale than tomouse and rat. This is a very encouraging result, especially, given that the PSA approach iscompletely alignment free. We believe a more detailed study of the use of PSA in phylogentictree construction (for instance, deriving other features or measures of similarity) could furtherimprove the results.
hors
ew
rhi
noca
tg
seal
h se
alba
boon
goril
lahu
man
com
chi
mpi
g ch
imgi
bbon
oran
guta
nb
wha
lef w
hale
cow
mou
se rat
plat
ypus
opos
sum
wal
laro
o
Figure 4.5. Phylogenetic tree for 20 species constructed using the predicted probabilities
obtained using the PSA.
4.7 Summary
We have presented the probabilistic suffix array (PSA), a data structure for representinginformation in variable length Markov models. The PSA provides the same functionality as theprobabilistic suffix tree (PST), but at a significantly reduced time and space requirement. Givena sequence of length N, construction and learning in the PSA is done in O(N) time and space,independent of the Markov order. Prediction using the PSA is performed in O(m log N
|Σ|) time,where m is the pattern length, and Σ is the symbol alphabet. The specific memory requirementfor PSA constuction is 33N bytes in the worst case, and 26N bytes on average, including space
CHAPTER 4. THE PROBABILISTIC SUFFIX ARRAY 83
for the suffix array and the input sequence. This can be compared with the 41N bytes neededusing the PST.
We have shown experiments in computational biology. The first experiment compares PSAwith PST [17] and SPST [71] on the same data set. The space for PSA is more efficient thanPST-FULL and SPST. The space of PSA is close to the PSA-20 which only stores L=20 depthpath. The construction time of PSA is significant fast than PST and SPST and the predictiontime of these three methods (PSA, PST and SPST) is similar. The other experiment is Phyloge-netic Tree Construction by PSA. This experiment shows a very encouraging result, especially,given that the PSA approach is completely alignment free.
CHAPTER 4. THE PROBABILISTIC SUFFIX ARRAY 84
Table 4.4. Performance of the PSA in modeling and prediction of protein families. Families
correspond to the first 51 protein families with 12 or more members in the Pfam database,
ordered alphabetically based on their abbreviated names in Pfam. For comparison, we have
included the results obtained using the PST [17] on the same data set. (TP stands for true
positive, while MD stands for missed detection). ∗∗The family apple was not in the dataset
used in [17].Family Size # MD TP (%) Size # MD TP (%)
by PSA by PSA by PST by PST7tm 1 530 26 0.951 515 36 0.9307tm 2 36 2 0.944 36 2 0.9447tm 3 12 1 0.917 12 2 0.833AAA 79 9 0.886 66 8 0.879ABC tran 330 46 0.861 269 44 0.836actin 160 22 0.863 142 4 0.972adh short 186 48 0.742 180 20 0.889adh zinc 129 22 0.829 129 6 0.953aldedh 69 11 0.841 69 9 0.870alpha-amylase 114 0 1.000 114 14 0.877aminotran 63 14 0.778 63 7 0.889ank 305 60 0.803 83 10 0.880apple∗∗ 16 1 0.938arf 43 2 0.953 43 4 0.907asp 72 5 0.931 72 12 0.833ATP-synt A 79 1 0.987 79 6 0.924ATP-synt ab 183 1 0.995 180 6 0.967ATP-synt C 62 1 0.984 62 5 0.919beta-lactamase 51 2 0.961 51 7 0.863bZIP 95 14 0.853 95 10 0.895C2 101 21 0.792 78 6 0.923cadherin 168 12 0.929 31 4 0.871cellulase 40 3 0.925 40 6 0.850cNMP binding 69 4 0.942 42 3 0.929COesterase 62 3 0.952 61 5 0.918connexin 40 3 0.925 40 1 0.975copper-bind 61 1 0.984 61 3 0.951COX1 80 4 0.950 80 13 0.838COX2 114 10 0.912 109 2 0.982cpn10 58 1 0.983 57 4 0.930cpn60 84 1 0.988 84 5 0.940crystall 103 6 0.942 53 1 0.981cyclin 80 19 0.763 80 9 0.888Cys-protease 95 1 0.989 91 11 0.879cystatin 88 11 0.875 53 4 0.925Cy knot 61 6 0.902 61 4 0.934cytochrome b C 133 4 0.970 130 27 0.792cytochrome b N 170 4 0.976 170 3 0.982cytochrome c 175 10 0.943 175 11 0.937DAG PE-bind 108 5 0.954 68 7 0.897DNA methylase 57 2 0.965 48 8 0.833DNA pol 51 12 0.765 46 9 0.804dsrm 22 14 0.364 14 2 0.857E1-E2 ATPase 117 2 0.983 102 7 0.931efhand 739 96 0.870 320 25 0.922EGF 676 33 0.951 169 18 0.893enolase 41 2 0.951 40 0 1.000fer2 88 15 0.830 88 5 0.943fer4 156 34 0.782 152 18 0.882fer4 NifH 49 2 0.959 49 2 0.959FGF 39 2 0.949 39 1 0.974
CHAPTER 4. THE PROBABILISTIC SUFFIX ARRAY 85
Table 4.5. Summary performance in protein family classification using the PSA and PSTFamily Size # MD TP (%) Size # MD TP (%)
by PSA by PSA by PST by PSTMean 128.216 12.373 0.902 97.820 8.720 0.908Std 145.798 17.711 0.103 84.778 8.690 0.050Min 12.000 0.000 0.364 12.000 0.000 0.792Max 739.000 96.000 1.000 515.000 44.000 1.000Total 6539 631 4891 436
Table 4.6. Summary data on the first 51 families in Pfam, as described in Table 4.4.Protein Family Length # of Internal γ = M
N Min MaxSize (N) nodes (M) LCP LCP
Mean 128.216 22173.451 252.353 0.616 190.608 14.659Std 145.798 23359.451 136.862 0.079 139.531 9.198Min 12.000 1360.000 68.000 0.460 26.000 3.987Max 739.000 140744.000 753.000 0.750 719.000 49.912
Table 4.7. Construction memory needed for the PSA and PST. Results are based on the first
51 families in Pfam, as described in Table 4.4.PSA MC PST-20 MC PST- MC SPST MC
Factor Factor FULL Factor FactorMean 616 33.16 425 41.67 1577 167.47 1068 67.17Std 655 8.23 116 42.68 125 211.22 541 23.77Min 71 16.74 263 4.33 1455 12.77 79 18.53Max 4523 53.46 1023 222.12 2235 1216 2547 111.53
Table 4.8. Construction time comparison for PSA and PST. Results are based on the first
51 families in Pfam, as described in Table 4.4. Recorded time is time needed per family (in
seconds). Speedup is computed as the ratio with respect to PSA time.PSA PST-20 Speedup PST-FULL Speedup SPST Speedup
Mean 0.092 0.244 3.150 12.648 250.967 1.825 1.740Std 0.106 0.231 1.297 9.984 238.806 2.701 2.617Min 0.005 0.016 0.854 3.780 20.917 0.047 0.036Max 0.545 1.376 6.895 56.540 1076.000 17.121 16.658
CHAPTER 4. THE PROBABILISTIC SUFFIX ARRAY 86
Table 4.9. Prediction time comparison between PSA and PST. Results are based on the first
51 families in Pfam, as described in Table 4.4. Recorded time (in seconds) is prediction time
per family – i.e. total time needed to predict all members in the family against all the other
families.PSA PST-20 Speedup PST-FULL Speedup SPST Speedup
Mean 87.71 98.83 2.52 113.25 2.79 207.03 3.4Std 93.89 66.4 3.28 74.47 3.46 130.93 2.39Min 5.00 59.23 0.25 21.47 0.32 33.50 1.03Max 576 528 20.13 579.3 18.86 600 17.6
Chapter 5
Circular Pattern Matching
5.1 Introduction
The CPM problem was first introduced in 1980 [19]. Since then, variations of the circularpattern matching problem has been studied. The first variant [46, 115] is the exact circularpattern matching(ECPM) problem. This problem is to find all occurrences of a circular patternP in a text T without any error. The second variant [44, 81] is the approximate circular patternmatching problem. This problem allows some error between the circular pattern P and the textT . According to the definitions in Chapter 2 on related work, most approaches have focused onthe existential query for the ACPM1 problem.
The ACPM2 problem is given text T , circular pattern P and maximum error k, return allpositions where the circular string [P] match to text T with at most k errors. [P] is said to be ak-approximate match with text T at position j ∈ [1...n−m− k +1] if EDc(P,T [ j... j +m])≤ k,where 0≤ t ≤ m−1,−k ≤ n−m. This is clearly more difficult than the ECPM problem or theACPM1 problem for existential and counting queries.
Main Results. Our main goal is to solve the ECPM and ACPM2 problems for existentialqueries which were introduced under related work. We then define and solve the ECPD andACPD problems to find “interesting” circular patterns, as defined using specified constraints.
87
CHAPTER 5. CIRCULAR PATTERN MATCHING 88
In this chapter, we present a new algorithm to solve the ECPM problem. This algorithm runsin linear time and space complexity. To date this is the best ECPM algorithm with respect to timeand space complexity. We also present four algorithms to solve the ACPM2 problem. Three ofthe algorithms report complete results and one is a greedy(suboptimal) algorithm reporting anincomplete result. We compare our algorithms with other algorithms in the literature such asthose reported by Maes [81],Gregor [44],Uliel [123] which were introduced in related work. Onaverage, our ACPM2 algorithm provides the best result for the ACPM2 problem with respectto time and space complexity.
The following three theorems represent our main contributions on the CPM problem.
Theorem 5.1: Given a text T = T [1..n] and a circular pattern P = P[1..m], with symbolsfrom an alphabet Σ, the ECPM algorithm can solve the ECPM problem in O(m log |Σ|) worstcase time after constructing the suffix tree and suffix links in O(n) time and space complexity.
Theorem 5.2: Given a text T = T [1...n] and a circular pattern P = P[1...m], with symbolsfrom an alphabet Σ, Algorithm ACPM2 solves the ACPM2 problem in O(km2n) time, and O(n)space.
Theorem 5.3: Given a database sequences SeqDB, with Z sequences and N total symbols(|Σ| could be O(N)), Algorithm ACPM2 solves the all-against-all ACPM problem in O(kmaN)time cost on average, and O(kmmN2) time worst case, using O(N) worst case space, wherema = N
Z , mm is the length of the longest sequence in SeqDB.
Organization. In the next section, we present a linear-time linear-space algorithm for theECPM problem. Algorithms for the ACPM2 problem are presented and analyzed in Section 3.In Section 4, we show experiments on analyzing circular permutations in multidomain proteinsusing our algorithms. Based on the results, we perform protein function prediction for multido-main proteins. In Section 5, we summarize our work on circular pattern matching problems.
CHAPTER 5. CIRCULAR PATTERN MATCHING 89
5.2 Exact Circular Pattern Matching Problem
In this section, we present an algorithm to solve the ECPM problem in linear time andlinear space. Our method is index-based and can be built on the space-efficient virtual suffixtree proposed earlier in Chapter 3. The key to our method is the suffix link provided by thesuffix tree and the VST. To our knowledge, this is the first time that the suffix link has beenexploited to solve the circular pattern matching problem.
Notation. Before describing the algorithm, we assume there is a sequence database SeqDBwith Z sequences. The total number of symbols in SeqDB is N. Let SeqDB[i] be the i-thsequence in SeqDB, where 0 < i ≤ Z. Let mi be the length of SeqDB[i]. The average numberof symbols per sequence in SeqDB is ma = N
Z . Let k be the allowed error in a match. In thisdatabase SeqDB, the alphabet size is |Σ|.
5.2.1 Linear Time ECPM Algorithm
The index-based approach was first introduced by Iliopoulos and Rahman [51]. However,their methods were quite complex and difficult to implement. The best time complexity ofIliopoulos and Rahman’s algorithms is O(N logN). Our algorithm is relatively simple withlower time and space complexity. First we give an example of ECPM and use this to explainour algorithm. Then we present a formal algorithm for the ECPM problem with input pattern Pand ST , the suffix tree constructed from T . We build this algorithm to solve the all-against-allversion of the ECPM problem, given a database of sequences.
Examples. In chapter 3, we showed the suffix tree for the string T =missississippi$.We utilize this example again here. Figure 5.1 shows the suffix tree for T with a few suffixlinks. Let circular pattern P =iss, so [P] = {iss,ssi,sis}, where f 0(P) = iss, f 1(P) =ssi, f 2(P) = sis. f i+1(P) is obtained by removing the first symbol from f i(P) and thenappending this symbol at the end, where 0 ≤ i < m− 1. So in our algorithm, when we matchf i(P), we use the suffix link to find the first (m−1) symbols of f i+1(P). This operation is donein constant time. Then we only need to compare the last symbol of f i+1(P) to the correspondingposition in the suffix tree and if the symbols match, we report a match for f i+1(P).
CHAPTER 5. CIRCULAR PATTERN MATCHING 90
In our algorithm, we use the suffix link to match the circular pattern in an incrementalmanner. For example, we search f 0(P) = iss first. In this case, we find iss is in the edgebetween node N1 and node N4, thus there is a match. Then match for f 1(P) = ssi by followingthe suffix link from node N4 to node N6. Thus we get the next matching edge in internal node N6
by using the “skip/count” method [46]. The time cost of this operation is constant time. To lookfor the matches to f 2(P) = sis, we use the suffix link from node N6 to node N5. The prefix sihas matched up to node N5, thus we only check the outgoing edges from node N5 to its children,and select the one whose first symbol matches symbol s. We find the edge between node N5
and N8 is matched by symbol s. So all of circular matchings of [P] have been found whosepositions start from the leaf nodes of N4,N6 and N5 respectively. From position 2 to position 9in T , there are nine substrings that matched the circular pattern P = iss.
We give another more complex example for searching f 0(P) = ism. First, we find theprefix is in the edge between node N1 and node N4, but there is a mismatch at the last symbolm. To search for f 1(P) = smi, we follow the suffix link from node N4 to node N6. The previousiteration only matched two symbols, so this iteration start from the second symbol of f 1(P) =smi which is m. However, the length of path from root to N6 is 3, which is larger than 1, sothis searching starts from node N3 which is the parent node of N6. Then we check the secondsymbol, but it is still a mismatch. Thus we know that f 1(P) = smi does not occur in T . Now tosearch for f 2(P) = mis, we again follow the suffix link from node N3 to the root, and continuethe match from the root. Thus, we find a match of f 2(P) = mis on the path to leaf node6. Therefore, the system will report one occurrence in leaf node 6 of the suffix tree, whichcorresponds to position 1 in T .
ECPM Algorithm Description
Algorithm ECPM (Algorithm 5.1) shows the pseudo code for our exact circular pattern match-ing algorithm. In this algorithm, the input ST is the suffix tree for the text T . T denotes a giventext (string, or sequence) to be searched. For a given pattern P with length of m, the algorithmderives a new pattern, called PP, PP is derived from Pattern P which repeats the Pattern P andremoves the last character of P. Thus is, PP = P[1...m]◦P[1...m−1]. Therefore, the new patternPP has a length of 2m−1.
CHAPTER 5. CIRCULAR PATTERN MATCHING 91
In the suffix tree, ST, the algorithm searches PP starting from the root of ST from theleftmost branch to the rightmost. The iteration variable i indicates the starting position in PP.
The variable len indicates the position of node label. When len is equal to 1, a new childnode is created for the current node. Line 4 in the algorithm represents the process of findingthe right child for the current node. This operation requires O(log(|Σ|)) time for comparison.If len is larger than 1, the matching operation takes place in the current node.
The variable top is the pointer which points to the current circular pattern. If the length ofthe matched pattern PP[top...i] is m, it means that a circular pattern occurs. Since the numberof possible circular patterns is m, pointer top will never be larger than m. If top pointer is largerthan i, it indicates that the symbol pattern[top] character never occurred in the text. Thus, thisset of circular patterns cannot be in the text. The pointer top is increased by one in the followingtwo cases.
The first case is when a mismatch occurs in the current node. In this situation, the currentnode is replaced by the node which is pointed to by its suffix link. At the same time the stringdepth for the current node decreases by one. Pointer top increases by one. After this, the nextiteration of the string matching process starts. The other case is when the path length from theroot to the current node is m. It indicates one of the circular patterns occurs inside Text. Beforematching the next position of PP, pointer top increases by one to keep the length of the patternequal to m.
Based on the ECPM algorithm, we develop an algorithm to solve the all-against-all prob-lem. That is, we compute ECPM(SeqDB[i],SeqDB[ j]),∀i, j, i 6= j.Algorithm ALL-VS-ALL-ECPM (Algorithm 5.2) enumerates each sequence in the database as a pattern to search thecircular relation by using the former ECPM algorithm. It constructs the suffix tree ST for theentire database first. Then let each sequence be a pattern to search in ST .
Algorithm Analysis
In the ECPM algorithm (Algorithm 5.1), when the same position in PP is compared again (Line18), pointer top increases by one. top remains less than or equal to m. Thus, Line 18 runs at
CHAPTER 5. CIRCULAR PATTERN MATCHING 92
most m times.
The “for” loop from Line 3 to Line 23 runs at most 2m−1 times. Inside this loop, the cost ofLine 4 is at most O(log |Σ|). The other lines inside this “for” loop have a constant running time.Therefore, the time cost of this algorithm is at most log |Σ|× (3m−1)=O(m log |Σ|). Given therelation between the ST and VST as described in Chapter 3, the proposed algorithm can easilybe modified to use the VST.
The space cost of implementation using the suffix tree is 32n + 2m−1, where m ≤ n. Thespace cost if we implement use the VST is 13.8n+2m−1.
We summarize the above in Theorem 5.1.
Theorem 5.1: Given a text T = T [1..n] and a circular pattern P = P[1..m], with symbolsfrom an alphabet Σ, the ECPM algorithm can solve the ECPM problem in O(m log |Σ|) worstcase time after constructing the suffix tree and suffix links in O(n) time and space complexity.
Algorithm ALL-VS-ALL-ECPM (Algorithm 5.2) builds the ST in O(N) time cost. Timecost of line 4 is O(m log |Σ|), so the total cost of the loop from line 2 to line 5 is O(N). Worstcase time complexity of this algorithm is O(N log |Σ|). Space complexity of this algorithm isO(N).
5.2.2 Comparison of ECPM algorithms
Table5.1 shows the comparison of our ECPM algorithm with Iliopoulos and Rahman’salgorithms [51] CPI-I, CPI-II. The table shows that the ECPM algorithm is the best algorithmwith respect to time complexity, even when |Σ|→O(N). In terms of space, the ECPM algorithmhas the same time complexity with CPI-II algorithm which was implemented using the suffixarray. The CPI-II algorithm implemented using the compressed suffix array has the best spacecomplexity among these algorithms, but the time complexity is not as good. We note that forpractical space complexity, the CPI-I need two suffix trees and various auxiliary arrays, andhence will require a significantly larger practical space.
CHAPTER 5. CIRCULAR PATTERN MATCHING 93
5.3 Approximate Circular Pattern Matching Problem
In this section, we present our algorithms for the ACPM problem. We start with a sim-ple greedy algorithm and then consider a suffix-array based q-gram algorithm for the ACPMproblem. First, we introduce a basic LIS algorithm APM-VIA-LIS (Algorithm 5.3) to find anapproximate match of a pattern P in text T . The algorithm does not handle circular patternmatching. Next, we propose our algorithms for the ACPM problem and analyze their complex-ity. The LIS method for pattern matching will be used in these algorithms. When we use thisalgorithm to solve the ACPM problem, we have to use all circular shifts f t(P) to match the textT .
The LIS method utilizes the LIS algorithm [46,50] to calculate the longest common subse-quence (LCS) [31,46] between two sequences. The verification process is to verify whether theedit distance between these two sequences is less than k. When we calculate LIS and LCS, eachmatched symbol will occur in the LCS. We are able to get occurring positions in two sequencesfor the matched symbols. We can use these positions to check the number of edit operationsbetween two matched symbols. Thus the algorithm reports the edit distance between these twosequences. The time complexity for this algorithm is O(mn
|Σ| logm). When |Σ| is close to O(m),as in the case for multidomain proteins, the time complexity will be O(n logm).
5.3.1 Greedy ACPM Algorithm
Algorithm ACPM-GREEDY (Algorithm 5.4) compares any two sequences with one as textand the other one as circular pattern in two main steps. The first step is the generation of LCS.The second step will verify whether the LCS generated in step 1 represents a part of a validsubsequence. These two steps were presented in Algorithm APM-VIA-LIS(Algorithm 5.3).
Table 5.1. Comparison of ECPM algorithmsECPM algorithm CPI-I [51] CPI-II [51] CPI-II(Compressed Suffix Array) [51]
Time Complexity O(N log |Σ|) O(N log1+ε N +N log logN) O(N logN) O(N logN logN)Space Complexity O(N)bytes O(N log1+ε N)bytes O(N)bytes O(N logN)bits
CHAPTER 5. CIRCULAR PATTERN MATCHING 94
First, Algorithm ACPM-GREEDY (Algorithm 5.4) will choose two sequences, one as textand the other one as circular pattern. After getting text T and circular pattern P, the ACPMworks on the following two steps. The first step creates a new pattern PP by concatenation of P.And then the second step calculates the LCS between PP and T and returns the LCS string lcs.This procedure is performed in line 5. This step also verifies the approximate pattern matchingwith parameter k.
This method is greedy(suboptimal): it finds only one occurrence of the pattern, it may notto detect all the existing circular patterns in the text. If there is more than one LCS in T , thismethod may miss some matches.
Time Complexity Analysis
For the time complexity analysis, we need to consider three cases.
1. For the case of using one sequence as pattern P and the other sequence as text T , thetime complexity of getting LCS (line 5) is O(mn
|Σ| logm). When |Σ| is close to O(m) as inmutildomain proteins, the time complexity will be O(n logm).
2. For the case of searching for one sequence against a group of sequences (loop from line2 to line 6), the time complexity is O(∑Z
i=1 ni logm) = O(N logm), where N is the totallength of all sequences used, N = ∑
Zi=1 ni Z is the number of sequences, and ni is the
length of the i-th sequence in SeqDB.
3. For the case of searching for a CP among a group of sequences (loop from line 1 to line 7),the time complexity is ZN logm), where m is the length of the longest sequences. The finaltime complexity is O(N2 logm), since Z = O(N). In our experiment with multidomainproteins, N ≈ 6Z
5.3.2 ACPM with LIS
Here we present a second algorithm (Algorithm 5.5) for the ACPM problem. This methodalso utilizes the LIS algorithm [46, 50] to calculate LCS [31, 46]. However, the method to
CHAPTER 5. CIRCULAR PATTERN MATCHING 95
construct Pattern P and Text T are changed. More importantly, unlike the greedy algorithmdescribed earlier, this algorithm can detect all the circular patterns. The pseudo code is listedin Algorithm ACPM-LIS . One sequence is used to construct the circular pattern P. Anothersequence is used as text T . All possible circular shifts of the pattern are enumerated. The subTis extracted from sequence T by a sliding window with size of m+ k. Finally, each enumeratedcircular shift of the pattern is searched against the sliding window separately. Assuming thesequence to be researched using a sliding window has the length n, then, there are max{1,n−(m+ k)+1}= O(n) windows to be constructed.
During the searching process, if there is a common subsequence with length m-k is found,then, there is a circular pattern occurring in Text T . This method reveals all the circular patternsin each sequence. Thus it can find the optimal solution, with respect to completeness of theresults.
Time Complexity Analysis
For one sliding window, the time complexity of finding one circular pattern is O(m(m+k)|Σ| logm
(line 7 to line 9). There are O(n) siding windows and O(m) circular patterns inside one querypattern and one text (line 5 to line 10). Therefore, the time complexity of detecting a circularpattern between one pattern and one text is O(m(m+k)
|Σ| logm×mn). Examining these terms, wecan find that k is at most O(m) and |Σ| = O(m). Thus, the time complexity can be abbreviatedas O(m logm×mn) = O(m2n logm).
For each pattern, the algorithm compares with the other sequences (line 2 to line 11). Thetime complexity of each pattern comparing with all other sequences is O(m2 logm×∑
Zi=1 ni)=O(m2N logm),
where N is the total length of all sequences and ni is the length of i-th sequence in SeqDB.
After considering all patterns (line 1 to line 12), the time complexity becomes ∑Zi=1 m2
i N logm),where m is the length of the longest sequence. In fact, ∑
Zi=1 m2
i ≤ (∑Zi=1 mi)2 = N2. The final
time complexity is therefore O(N3 logm). This is the worst case complexity.
CHAPTER 5. CIRCULAR PATTERN MATCHING 96
5.3.3 ACPM with q-grams and Suffix Array
The q-gram approach [3] is a two-phase method to reveal all approximate patterns. Thefirst phase is the Hypothesis Phase which determines all potential matching positions usingonly q-gram substrings of P and T . In the second phase, the Verification Phase, the algorithmverifies each potential matching position to report the correct matches. First we introduce anACPM algorithm with q-grams, then we present a hybrid algorithm with ECPM and ACPMwith q-grams. The latter algorithm is more time efficient in practice, but the theoretical timecomplexity is the same as the ACPM algorithm with q-grams.
Figure 5.2 shows the number of hypotheses with different q values in the ProDom databaseof multidomain protein sequences [32]. Here we used N = 106. We notice that when q increases,the number of hypotheses will decrease fast. So when q is not very small, e.g q≥ 3, the numberhypotheses will typically reduce to O(N).
Algorithm Description
Algorithm ACPM-QGRAM (Algorithm 5.6) shows the process. Lines 1-7 is the preprocessingstage. This stage constructs a long concatenated sequence, seq, using all the sequences so farencountered in SeqDB. It also builds an auxiliary array pos. This array is used to maintainthe relationship between position in seq and SeqDB. Line 8 constructs the suffix array for theconcatenated sequence.
Lines 9-24 is a loop to generate all of the hypotheses for the q-gram method using theLCP array. Line 11-13 determines candidate matching positions that have the same q-gramprefix. Line 14 considers each pair of candidate positions obtained with the current q-gram forverification.
Lines 15-22 is the verification algorithm. We use the LIS algorithm to verify the approx-imate patterns. Constructing the circular pattern is the same as in the previous algorithm.We enumerate the m circular patterns from a sequence one by one. We construct subT fromthe second sequence T as follows. Assume the q-gram occurs in position y, so let subT bethe substring of T which includes T [y...y + q− 1] and the length is (m + k). So text will be
CHAPTER 5. CIRCULAR PATTERN MATCHING 97
T [y−m− k + q...y + q− 1], T [y−m− k + q + 1...y + q], ... T [y...y + m + k− 1]. There are(m+ k−q) number of such substrings.
Time Complexity Analysis
The time complexity of LIS to verify a pattern vs. substrings of Text which includes onematched q-gram is O(m logm× (m + k− q)). Since k ≤ O(m) and q ≤ O(m), the time com-plexity is O(m2 logm). Each pair in the same group has O(m) circular pattern operations, thusthe time complexity for verifying each pair is O(m2 logm×m) = O(m3 logm) There are r groupsand group i has ni elements and there are ∑
ri=1 n2
i pairs. The total complexity is O(m3 logm×∑
ri=1 n2
i ). The worst case occurs when r = 1 with time complexity of O(N2m3 logm). For theaverage case, m is the average length of the sequences. Then the time complexity will be inO(Nm3
a log(ma)), where is ma = NZ .
Hybrid Algorithm
We can combine the ECPM algorithm (ECPM) and the ACPM algorithm with q-gram (ACPM-QGRAM) for a possible improvement in practical time. The ECPM algorithm reports the exactcircular pattern matches which should not be computed again when we search for approximatematches. First, the hybrid algorithm uses the ECPM algorithm to look for matching exactcircular patterns and stores them. Next the hybrid algorithm uses the q-gram method to generatehypothesis. In the verification phase, the algorithm checks whether this hypothesis has occurredbefore within the ECPM results. If it occurred, then the verification stage is skipped.
Thus this approach will reduce the practical running time, given the reduced verifica-tions. For checking the each hypothesis, it takes O(logN) time cost. The time complexityis O(N2(m3 logm + logN). But this algorithm needs O(N2) space to maintain the pair-wiseresults from the ECPM algorithm.
CHAPTER 5. CIRCULAR PATTERN MATCHING 98
5.3.4 Improved Algorithm: ACPM with Bidirectional Edit Distance
In this subsection, we propose an algorithm to solve the all-against-all ACPM2 problem.The algorithm uses a two-stage hypothesis generation – hypothesis verification paradigm. Aftergenerating the hypotheses using the q-gram filteration method, the algorithm verifies each hy-pothesis in O(km) time complexity, where k is the maximum error allowed and m is the lengthof the pattern. This algorithm follows the same general paradigm as the previous algorithm(Section 5.3.3), however, there are significant differences in both the hypothesis. In practical,this algorithm uses the suffix tree than the suffix array.
Filteration via q-grams
The q-gram approach [53] is a filteration method which is based on the fact that for any twostrings that are approximate matches, there must be some exact matching sub-region betweenthem. The problem is how to determine such sub-regions and their length(s). Lemma 5.1 showsthis fact and points out how to choose the value q, the minimum length of the matching regions.
Lemma 5.1 [14] : Given a text T , a pattern P of length m, and an integer k, (0 ≤ k < m),for a k-approximate match of P to occur in T , there must exist at least one q-length block ofsymbols in P that form an exact match to some q-length substring in T , where q = b m
k+1c.
Approximate pattern matching based on q-gram uses two phases. The first phase is thehypothesis phase which identifies all potential matches using q-gram filtering operations. Basedon partial exact matching, the algorithm can find O(N2) potential matches. The second phase isthe verification phase. Here the hypothesized potential matches from the first phase are verifiedto determine whether they are true k-approximate matches. Our ACPM2 algorithm is basedon q-gram filteration. First we generate hypotheses for potential approximate circular patternmatches by q-gram filter operations. Then we verify each hypothesis to find the true circularpatterns.
CHAPTER 5. CIRCULAR PATTERN MATCHING 99
ACPM Hypothesis Generation
Algorithm 5.7 represents our hypothesis generation phase using suffix trees. First, it builds thegeneralized suffix tree ST for the sequence database. Then, the algorithm treats each sequenceas a pattern. For each pattern, q is calculated using q = b m
k+1c, where m is the length of thecurrent pattern. There exists O(m) overlapping q-grams in a given m-length pattern. We searcheach overlapping q-gram of the pattern from left to right in the suffix tree ST . Each exactmatch is a hypothesis with the current q-gram. The use of the suffix tree implies that for eachgiven distinct q-gram in P, all the occurrences in T will be found as the leaf nodes from thesame parent node in the suffix tree. Finding this parent node requires only one q-gram matchin O(q log |Σ|) time. Similar to the ECPM algorithm, after searching the first q-gram, say qi =P[i...i+q−1] in ST , the next q-gram qi+1 = P[i+1...i+q] can be matched incrementally fromqi by using suffix links in constant time (Using the function Search-in-ST). Thus, counting allthe matching q-grams or locating all the parent nodes for each unique matching q-gram can bedone in O(m log |Σ|) time, independent of n or N, where n is the length of the current sequence(the text), and N is the total length of all the sequences in the database (the number of leaf nodesin the generalized suffix tree).
Given pattern P = P[1..m] and text T = T [1..m], there are O(m) q-grams in P and O(n)q-grams in T . So each q-gram of P can produce at most (n− q + 1) exact matches in T . Thenumber of hypotheses is O(m(n−q + 1)) = O(mn). We notice the added difficulty introducedby the ACPM2 problem. Unlike for the ACPM1 problem where only one occurrence of P in Tis required, here, we have to verify each of the potential O(mn) hypotheses, in order to identifyall the occurrences. In the sequence database seqDB, we have the total length of all sequencesas: N = ∑
Zi=1 mi. Each q-gram can produce potentially O(N) exact matches. Thus, the number
of hypotheses will be in O(N2). In the worst case, there exist O(|Σ|q) unique q-grams. Thenumber of hypotheses will be O(|Σ|q× ( N
|Σ|q )2) = O( N2
|Σ|q ) on average. When q increases, thenumber of hypotheses will decrease exponentially. It will be O(N) when |Σ| is O(N).
CHAPTER 5. CIRCULAR PATTERN MATCHING 100
ACPM Verification Algorithm Description
Our verification algorithm makes use of a novel bidirectional edit distance that uses both thedirect sequence and its reverse. Before we introduce our ACPM verification algorithm, we firstdescribe some important characteristics of edit distances.
Lemma 5.2 : Suppose there is an exact matching q-gram common to both the text T =T [1...n] and the pattern P = P[1...m]. Let (i, j) be the position of occurrence of the q-gram in Tand P respectively. If this q-gram is part of a true approximate match between P and T , then theedit distance between T and P is given by ED(P,T ) = ED(P[1... j−1],T [1...i−1])+ED(P[ j+q...m],T [i+q...n]), where 1≤ j ≤ m,1≤ i≤ n,q≥ 1.
Proof: There exists one q-length block in exact match between the pattern and the text(Figure 5.3). Since this q-gram is involved in a true approximate match between P and T , itmust be involved in the edit distance computation between P and T . Thus the optimal edit pathmust contain the subpath between these regions of the pattern and the text. P[ j... j + q− 1]only compares with T [i...i + q− 1], so the edit cost of the subpath is zero. We don’t needto compare P[1... j− 1] with T [i...n], nor do we need to compare P[ j...m] with T [1...i− 1],because the optimal edit paths do not cross these areas. Figure 5.3 illustrates this using a q-gram exact match (the solid line). Three optimal paths (the dashed lines) pass through point(i, j), but no optimal path can pass through point (i,L) or point (H, j), where L > j and H > i.P[1... j−1] only compares with T [1...i−1] and P[ j +q...m] compares with T [i+q...n]. Hencethe edit distance between pattern P and text T is the sum of three components ED(P[1... j−1],T [1...i− 1]), ED(P[ j... j + q− 1],T [i...i + q− 1]) and ED(P[ j + q...m],T [i + q...n]). SinceED(P[ j... j +q−1],T [i...i+q−1])=0, the Lemma holds. �
Lemma 5.3 : For unit cost edit operations, the ED(P,T ) = ED(PR,T R), where PR and T R
are reversed version of P and T respectively.
Proof: This Lemma has been proved in the proof of [118] Lemma 2.�
We need to verify each hypothesis generated by Algorithm 5.7. Assume QT and QP arethe q-grams from the text T and the pattern P respectively. Let QT = QP = Q. So there is
CHAPTER 5. CIRCULAR PATTERN MATCHING 101
an exact matching q-gram between the text and the pattern using this q-gram. We present anO(km) verification algorithm to determine whether the q-gram Q is part of a true k-approximatecircular match between P and T .
We now describe the idea of bidirectional edit distance, based on which we verify a givenhypothesis. Suppose QP is a substring of pattern P which starts at position j and QT is asubstring of text T which starts at position i, where 0 ≤ j ≤ m and 0 ≤ i ≤ n. That is, QP =P[ j... j + q− 1] and QT = T [i...i + q− 1]. Our goal is to compute the circular edit distancebetween pattern P and the substring of T denoted subTi, where subTi = T [i+q−1−m−k...i+m+k]. First, we construct two strings from the pattern P1 = P[ j +q...m]◦P[1... j−1] and P2 =PR
1 . We also construct two strings T1 = T [i+q...i+m+k] and T2 = T [i−1−m−k+q...i−1]R
from the text. We compute the edit distance between P1 and T1 using Ukkonen’s algorithm[121]. This can be done in O(km) and returns an array ED1 which contains the minimum editdistance of each row. The value of ED1[h] indicates the minimum distance between P1[1...h]and T1[1...r], where h− k ≤ r ≤ h + k. Similarly, We use the same algorithm to calculate theminimum edit distance array ED2 between P2 and T2. ED2[h] is the minimum distance betweenP2[1...h] and T2[1...r], where h− k ≤ r ≤ h+ k.
Figure 5.4 shows the comparisons made by the algorithm. Here we have P1 = P[ j+q...m]◦P[1... j− 1] and P2 = PR
1 = (P[ j + q...m] ◦P[1... j− 1])R = P[1... j− 1]R ◦P[ j + q...m]R. P2 ismatched against T2 in reverse direction, while P1 is matched against T1 in the regular (forward)direction.
We construct an array ED from ED1 and ED2 as follows.
ED[h] =
ED2[m−q−h] : 1≤ h≤ m−q0 : m−q+1≤ h≤ mED1[h−m] : m+1≤ h≤ 2m−q
(5.1)
Lemma 5.4 : For a given hypothesis occurring at positions i and j in T and P respectively,
ED( f h+ j+q−2(P),subTi) = ED[h]+ED[h+m−1], where 1≤ h≤ m.
Proof : According to Figure 5.4, we construct a new string PP as PR2 ◦P[ j... j +q−1]◦P1,
where PR2 = P1. Then PP = P[ j+q...m]◦P[1... j−1]◦P[ j... j+q−1]◦P[ j+q...m]◦P[1... j−1].
CHAPTER 5. CIRCULAR PATTERN MATCHING 102
There are three cases in representing f h(P). These cases are indicated as numbers 1,2, and 3respectively on the double-headed arrows in Figure 5.4.
CASE 1 : h = 1, where f h+ j+q−2(P) = PP[1...m]. PP[1...m] is constructed from two parts.One is PR
2 and the other is P[ j... j + q−1]. The edit distance in this case is ED[1]+ ED[m] byLemma 5.2, where ED[1] is ED2[m−q] and ED[m] = 0.
CASE 2 : 2 ≤ h ≤ m− q, where f h+ j+q−2(P) = PP[h...h + m− 1]. PP[h...h + m− 1] isconstructed from three parts. The first part of PR
2 . The second part is P[ j... j+q−1] and the lastpart is P1. From Lemma 5.2, we know the edit distance is ED[h]+ED[h+m−1].
CASE 3 : h = j. This case is similar to the first case, so the edit distance is ED[h]+ED[h+m−1].
In these three cases, we only compute 1 + m− q− 1 + 1 = m− q + 1 circular shifts of thepattern. We do not calculate the circular shifts f r(P), where j +1≤ r ≤ j +q−1. There existsan exact match involving P[ j... j+q−1] (the matching q-gram), and f r(P) does not contain thissubstring, when j + 1 ≤ r ≤ j + q− 1. Therefore, it is not necessary to calculate f h(P), whenj +1≤ r ≤ j +q−1.�
Lemma 5.5 : For a given hypothesis occurring at positions i and j in T and P respectively,EDc(P,subTi) = min0≤h≤m−1{ED[h]+ED[h+m−1]}, where subTi = T [i+q−1−m−k...i+m+ k].
Proof : There are m circular shifts in the pattern P. From Lemma 5.4, we calculate allof the minimum edit distances between possible circular shifts f h(P) and the text subTi in thishypothesis, where h ∈ [0, j]∪ [ j + q,m]. The circular edit distance of P against subTi is theminimum edit distance of all possible circular shifts against the text subTi in current hypothesis.The lemma holds. �
Lemma 5.6 : For a given hypothesis, the time complexity of the verification algorithm isO(km).
Proof : Algorithm 5.8 presents the verification processes following the above method.Clearly reversing the strings can be done in linear time, and computing edit distances on the
CHAPTER 5. CIRCULAR PATTERN MATCHING 103
reversed string does not change the time required. Similarly, EDc() in Lemma 5.5 can becomputed in O(km) time, by maintaining an O(m) array to record intermediate results duringthe computation of standard edit distance using dynamic programming. Line 4 calls the k-approximate pattern matching algorithm in O(km) time using Ukkonen’s algorithm [121] bybidirection. Line 5 implements equation (5.1). Line 6 to Line 9 run in O(m) time cost to checkthe matching by Lemma 5.4. Thus the time complexity is O(km). �
Theorem 5.2: Given a text T = T [1...n] and a circular pattern P = P[1...m], with symbolsfrom an alphabet Σ, Algorithm ACPM2 solves the ACPM2 problem in O(km2n) time, and O(n)space.
Proof: Suffix tree construction (including suffix links) can done in linear time and lin-ear space. After constructing the suffix tree, hypothesis generation phase is performed inO(m log |Σ|) time, independent of n or N, since at this stage we only need a count of the numberof hypothesis, and the the parent nodes of the leaf nodes in the ST that correspond to the startpositions of the matching q-grams in the text or database. Given pattern P and text T , thereexists O(mn) possible hypotheses. From Lemma 5.6, the time complexity of verification algo-rithm is O(km) for each hypothesis. This means that the algorithm solves the ACPM2 problemin O(km×mn) = O(km2n) time.�
Theorem 5.3: Given a database sequences SeqDB, with Z sequences and N total symbols(|Σ| could be O(N)), Algorithm ACPM2 solves the all-against-all ACPM problem in O(kmaN)time cost on average, and O(kmmN2) time worst case, using O(N) worst case space, wherema = N
Z , mm is the length of the longest sequence in SeqDB.
Proof: The result essentially from Theorem 5.2. For a sequence database seqDB of lengthN, hypothesis generation phase is performed in O(N log |Σ|). The number of hypotheses will beO(N2), so the time complexity is O(kmmN2) in the worst case, where mm is the length of thelongest sequences. On average, the number of hypotheses is O( N2
|Σ|q ), then the time complexity
is O(kmaN2
|Σ|q ), where ma is the average length of sequences in seqDB. When q increases or|Σ| → N, the time complexity will be O(kmaN). Space requirement is in O(N) to maintain thesuffix tree data structure. �
CHAPTER 5. CIRCULAR PATTERN MATCHING 104
5.3.5 Comparison with Other ACPM Algorithms
In Table5.2, we compare our ACPM-QGRAM algorithm and ACPM-BIDIRECTIONAL al-gorithm with the other related algorithms which were introduced in Chapter 2, namely Maes’ al-gorithm [81], Gregor and Thomason’s algorithm [44] and Uliel et. al’s algorithm [123]. Weineret. al’s algorithm [126, 127] is a greedy algorithm which may miss some important circularrelations. We also compare this algorithm with the other algorithms.
Our goal is to develop an algorithm to solve the ACPM problems and to apply this to studycircular permutations in multidomain proteins. We make minor changes in Maes’ algorithm[81], Gregor and Thomason’s algorithm [44], Uliel et. al’s algorithm [123] and Weiner et. al’salgorithm [126, 127] for adapting them to the ACPM problems. Because these algorithms allfocus on computing the circular edit distance, we extend them to match the pattern against allthe substring of T . This takes (n−m) steps, and hence the total time complexity will increaseby n−m = O(n) times.
The time complexity of our ACPM-QGRAM algorithm is O(m3aN2) in the worst case. On
average, the time complexity is O(m3aN2/|Σ|q), where ma is the average length of sequence
and N is the total length of sequences, and q = b mk+1c. When q increases, O(N2/|Σ|q) will be
reduced to O(N), since |Σ|q ≤ O(N).
The time complexity of our ACPM-BIDIRECTIONAL algorithm is O(kmmN2) in the worstcase. On average, the time complexity is O(kmaN2/|Σ|q), where ma is the average length of se-quence and N is the total length of the sequences, and q = b m
k+1c. When q increases, O(N2/|Σ|q)will be reduce to O(N) since |Σ|q ≤ O(N).
Table 5.2 shows the time complexity for the worst case of these algorithms. The last rowshows the average case for the most challenging problem of all-against-all approximate circularpattern matching. Comparing with the ACPM-QGRAM, when m is large (m = O(N)), our q-gram algorithm will be worse. In this case, the Maes [81] algorithm will be the best algorithm inthe worse case. But when m
k+1 increases and m is not very large, the ACPM-QGRAM algorithmwill run in O(m3N), where m = N
Z . This can be treated as a constant (NZ ≈ 6 for the case of
multidomain proteins). Therefore, under such condictions, the ACPM-QGRAM algorithm is alinear time algorithm on average. The proposed ACPM-QGRAM algorithm is better than the
CHAPTER 5. CIRCULAR PATTERN MATCHING 105
Table 5.2. Comparison with other proposed ACPM Algorithms
Maes [81] Gregor et. al’s [44] Uliel et. al’s [123] Weiner et. al’s1 [126, 127] ACPM-qgram ACPM-BIDIRECTIONAL
One circular pattern O(m2 logm) O(m3) O(m3) O(m2) O(m2 logm/|Σ|) O(km)against one text =O(m logm)windowOne-against-One O(m2 logmn) O(m3n) O(m3n) O(m2n) O(m3 logm) O(kmn)One-against-All O(m2 logmN) O(m3N) O(m3N) O(m2N) O(m2N logm) O(kmN)All-against-All O(∑Z
1 m2N logm) O(∑Z1 m3N) O(∑Z
1 m3N) O(∑Z1 m2N) O(m3N2 logm) O(kmN2)
=O(N3 logm) =O(N4) =O(N4) =O(N3)Average Case O(N2ma logma) O(N2ma) O(N2m2
a) O(maN2) O(m3aN log(ma)) O(kmN)
All-against-All(m = ma = N
Z )
other four related algorithms which were introduced in related work. Comparing the ACPM-BIDIRECTIONAL algorithm with Maes’ algorithm [81] which is the best available algorithmfor the ACPM1 problem, we can see that apply Maes’ algorithm to the all-against-all ACPMproblem requires time in Θ(N2ma logma) on average, and O(N3 logmm) worst case. These arestill worse than our proposed algorithm that runs in O(kmaN) time on average and O(kmmN2)worst case. Landau et al’s algorithm [67] runs in O(kmn) to solve the ACPM2 problem (forone-against-one). In the all-against-all ACPM2 case, the time complexity will be Θ(kN2). Thisalgorithm is worse than our algorithm that run in O(kmaN) time on average.
5.4 Experiments
As discussed in Chapter 1, circular permutations and cyclic pattern have been used in var-ious studies in biology. We performed some experiments using the results of the proposedalgorithms to study circular permutations in molecular biology. In our experiments, we applyour algorithms on multidomain proteins to look for potential circular permutation relationshipsbetween them. We also use the results of the proposed algorithm to predict potential functionsfor uncharacterized or unknown proteins. In these experiments, the alphabet size is the numberof domains which is a large number, close to 106.
1Algorithm could produce incomplete results
CHAPTER 5. CIRCULAR PATTERN MATCHING 106
5.4.1 Data Set
Protein Domain Database (ProDom)
The protein domain is a section of the protein sequence whose structure can evolve, functionand it exists independently of the rest of the protein chain [32]. Most proteins consist of severaldomains. The same protein domain may occur in related proteins. The ProDom is a databaseof known protein domains. The ProDom web site (http://ProDom.prabi.fr) provides a tool tosearch a protein domain in the protein database. The results are the proteins which contain agiven protein domain. Each domain is represented as a unique symbol, thus a multidomainprotein is viewed as a sequence of such symbols. The length of the domain representationis generally much smaller than the original protein sequence, but the size of alphabets hasincreased drastically.
Gene Ontology Database (GO)
The Gene Ontology (GO) project (http://www.geneontology.org/) provides a description ofgenes and protein products in different databases including the known functions of the genes.Currently the GO Consortium includes many databases such as GeneDB (http://www.genedb.org/),UniProtKB-Gene Ontology Annotation @ EBI (UniProtKB-GOA) (http://www.ebi.ac.uk/GOA/)and FlayDB (http://flybase.bio.indiana.edu/). More details on the GO Consortium is availableat http: //www.geneontology.org/GO.consortiumlist.shtml.
The ProDom database provides the Accession Number for the parent protein of each do-main. The Accession Number is also provided for UniProtKB-GOA. This establishes a con-nection between entities in ProDom and their corresponding entities in GO database. In ourexperiments, we used this relation to obtain the GO terms used to describe the protein func-tion. Based on this, we can predict functions for multidomain proteins using our CPM resultsobtained using the proteins in ProDom.
CHAPTER 5. CIRCULAR PATTERN MATCHING 107
5.4.2 CPM Experimental Design
We implemented the exact circular pattern matching algorithm and three approximate cir-cular pattern matching algorithms and applied them to detect circular patterns in ProDomdatabase. We downloaded data from ProDom web site(http://ProDom.prabi.fr) on March 12,2009 (ProDom version 2006.1 as released on November 6th, 2008). There were 1,997,497 pro-teins in this database. We removed proteins with less than three domains, and also removedredundant proteins with more than 90% similarity to some other protein. The result is a reduceddatabase with 973,686 proteins. This means that ECPM and ACPM will only apply to proteinsthat contains an entire copy of another protein.
Results: Speed and Completeness
We ran the four algorithms on the reduced database and use the results to analyze the relation-ship between multidomain proteins. ACPM-QGRAM algorithm was executed on two differentparameters, namely q = 1 and q = 2. When q = 1, the result is complete. When q = 2, the resultis suboptimal (incomplete). We use the complete results as a benchmark to compare with theresults from the algorithm.
The exact algorithm is the fastest algorithm. It only needed six minutes to build the suffixtree and perform searches for all circular patterns. ACPM-LIS algorithm is the slowest al-gorithm. ACPM-GREEDY algorithm is faster than the other ACPM algorithms, but the resulthas low accuracy (around 50%). Figure 5.5 shows the practical time required by these threealgorithms, where q-gram has two instances, q = 1 and q = 2.
A comparison of the outputs of the algorithms provides some insight in their overall per-formance. There are 29,625,738 relations in the complete result. ECPM algorithm can beviewed as a greedy algorithm when the objective is approximate matching. The number ofrelations found using ECPM was 28,096,046 which is close to the complete result. ACPM-GREEDY only identified 15,075,729 relations. The ACPM-QGRAM algorithm with parameterq = 2 found 29,345,380 relations. When we run the ECPM algorithm in ProDom, we get almost95% of the complete relations. When we run ACPM-QGRAM algorithm of q = 2, we get more
CHAPTER 5. CIRCULAR PATTERN MATCHING 108
than 99% of the complete relations.
We run the hybrid algorithm where the ECPM algorithm was applied first and followed byusing ACPM-QGRAM algorithm with parameter q = 1. We get the complete result and the timecost was reduced from 41 hours to 14 hours.
Analysis of Results
Based on the results, we built a relationship network among the multidomian proteins. This is adirected graph. The proteins are represented as the vertices, while the relations are representedby the edges. The In-edges and Out-edges are defined as follows. If a protein sequence P1 isa circular pattern in protein sequence P2, then there is an Out-edge from P1 to P2. Conversely,there is an In-edge from P2 to P1. Figure 5.6 shows the degree distribution of the network.Panel (a) of Figure 5.6 is the degree distribution of all vertices and panel (b) shows the degreedistribution of the Top-100 highest degree nodes. Panel (c) and (d) are log-log plots of panel (a)and (b) respectively.
Each protein sequence is not only used as a pattern to search against the other proteinsequences, but also used as text to be searched against using the other protein sequences inthe database. 424,888 protein sequences were found to be a pattern in some other proteinsequences. 799,044 protein sequences contain at least one other protein sequence as a circularpattern. 374,279 protein sequences have both out-edge and in-edges. 50,609 protein sequencesonly have out-edges while 424,765 protein sequences only have in-edges. The average degreeof this graph was 23 with an average out-degree of 46 and an average in-degree of 24.5.
Figure 5.7(a) shows the number of directly connected pairs in the Top-K highest degreeproteins, where K is 10, 20 ... 1000. Let the Top-K highest degree proteins be vertices of asubgraph, the number of directly connected pairs is the number of edges. We define a ratio ρK
as follows: ρK = # o f total edges# o f edges in Top−K complete subgraph = # o f observed edges
12×K×(K−1)
. Figure 5.7(b) shows theratio ρk in Top-K proteins. When K is less than 460, the ratio ρK stays stable at around in 0.5.When K is larger than 460, the ratio ρK starts to decrease. Thus in this graph, Top 460 highestdegree proteins have higher relations.
CHAPTER 5. CIRCULAR PATTERN MATCHING 109
Table 5.3. Top 15 highest degree proteins with GO functionRank Count AC Number Go Description
1 23353 Q7VMZ1 nucleotide binding ; ATP binding ; ATPase activity ; nucleoside-triphosphatase activity2 23344 Q9CPC5 nucleotide binding ; ATP binding ; ATPase activity ; nucleoside-triphosphatase activity3 23338 Q3EG14 Protein not found in GO4 20508 Q33HH1 Protein not found in GO5 20446 Q47AY9 nucleotide binding ; ATP binding ; ATPase activity ; nucleoside-triphosphatase activity6 20446 Q4UQ62 nucleotide binding ; ATP binding ; ATPase activity ; nucleoside-triphosphatase activity7 20446 Q8P4K7 nucleotide binding ; ATP binding ; ATPase activity ; nucleoside-triphosphatase activity8 20446 Q8PG73 nucleotide binding ; ATP binding ; ATPase activity ; nucleoside-triphosphatase activity9 20415 Q426Q5 Protein not found in GO
10 20398 Q3BNR9 nucleotide binding ; ATP binding ; ATPase activity ; nucleoside-triphosphatase activity11 20393 Q73PA3 nucleotide binding ; ATP binding ; ATPase activity ; nucleoside-triphosphatase activity12 20273 Q50XK7 Protein not found in GO13 20246 Q66C16 nucleotide binding ; ATP binding ; ATPase activity ; nucleoside-triphosphatase activity14 20244 Q5NU40 No function in GO15 20244 O32748 nucleotide binding ; ATP binding ; ATPase activity ; nucleoside-triphosphatase activity
Protein Function Prediction
Table 5.3 shows the protein function for the Top-15 highest degree proteins. We notice that 10of the 15 proteins have exactly the same functions. There are four proteins (rank is 3,4,9,12respectively) that do not have entries in the GO database. Protein Q5NU40 (rank 14) has arecord in GO database, but there is no function assigned to it in GO database. With highprobability, we can say that the four proteins with no known function are likely to have thesame function as the other 10 proteins. We plan to verify these functions by searching thebiology literature in future.
We use the z-score as a measure of significance of the relationship between two proteins.For a given random variable x, the z-score is defined as follows. z = x−µx
σx, where µx is the mean,
and σx is the standard deviation.
To predict the function for a protein say PA, we compute the z-scores for the number ofoccurrences of given function for the proteins in the respective In-edge and Out-edge sets forprotein PA. We then assign the protein function as the function with z-score above a threshold.
Table 5.4 shows the prediction results on 9 multidomain protein sequences using the unionof the functions of the In-edge and Out-edge proteins at different thresholds on the z-scores.
CHAPTER 5. CIRCULAR PATTERN MATCHING 110
Table 5.5 shows equivalent result using intersection.
We also conducted an experiment to predict the protein functions in the Top-500 highestdegree proteins of these 156 proteins were not found in the GO database. Table 5.6 showsthe prediction performance in terms of precision, recall and the F-measure, where FP is thenumber of false positive; FN is the number of false negative; T P is the number of true positive.The recall is calculated as T P
T P+FN and the precision is calculated as T PT P+FP . The F-measure is
calculated as 2× recall×precisionrecall+precision .
From the F-measure of Table 5.6, we notice the union method at z ≥ 3 has the highestF-measure 0.84. This indicates the union method at z ≥ 3 provides the best result of all thecombination.
5.4.3 Multidomain Protein Networks using Circular Patterns
Introduction
Philipp et al. [100] introduced a tool to discover the potential relationships between proteinsusing the protein domain network. This protein domain network was based on the protein do-main interaction networks. They built a web resource to explore the Protein Domain InteractionMAp(DIMA). In this network, the nodes are the protein domains and the edges are the inter-actions between two protein domains. In our work, network formation is based primarily oncyclic relationships between multidomain proteins.
Dataset
In this experiment, we further studied the use of our proposed ECPM and ACPM algorithms onthe problem of analyzing multi-domain protein sequences. Based on the patterns found by ouralgorithms, we constructed multidomain protein networks by connecting different multidomainproteins that are found to be associated by some matching circular or non-circular patternsfound by our algorithms. We use the Pfam database [15, 39] to identify the families for themultidomain proteins in ProDom. (We had introduced the Pfam database earlier in Chapter
CHAPTER 5. CIRCULAR PATTERN MATCHING 111
Table 5.4. The predicted protein functions using union for In-edge and Out-edgeProtein Function Predicted Predicted PredictedAC Number Function Function Function
(z≥3) (z≥2) (z≥1)Q7VMZ1 GO:0000166 GO:0000166 GO:0000166 GO:0000166
GO:0005524 GO:0005524 GO:0005524 GO:0005215GO:0016887 GO:0016887 GO:0016887 GO:0005524GO:0017111 GO:0017111 GO:0017111 GO:0016787
GO:0042626 GO:0016887GO:0017111GO:0042626
O32184 GO:0003824 GO:0003824 GO:0003824 GO:0003824GO:0005488 GO:0005488 GO:0005488 GO:0004316GO:0016491 GO:0016491 GO:0016491 GO:0005488
GO:0016491Q2Y7W6 GO:0000156 GO:0000155 GO:0000155 GO:0000155
GO:0004871 GO:0004871 GO:0004871Q33CH5 GO:0003723 GO:0003723 GO:0003723 GO:0003723
GO:0003968 GO:0003968 GO:0003968 GO:0003968Q30U32 GO:0000156 GO:0000156 GO:0000155 GO:0000155
GO:0004871 GO:0000156 GO:0000156GO:0004871 GO:0004871
O93828 GO:0004585 GO:0004585 GO:0004585 GO:0004585GO:0016597 GO:0016597 GO:0016597 GO:0016597GO:0016740 GO:0016740 GO:0016740 GO:0016740GO:0016743 GO:0016743 GO:0016743 GO:0016743
Q30SN9 GO:0003824 GO:0003824GO:0004252 GO:0004252
GO:0005515Q2YTY7 GO:0003723 GO:0003723
GO:0009982 GO:0009982O78911 GO:0008137 GO:0008137 GO:0008137 GO:0008137
GO:0016491 GO:0016491 GO:0016491 GO:0016491
CHAPTER 5. CIRCULAR PATTERN MATCHING 112
Table 5.5. The predicted protein functions using intersection for In-edge and Out-edgeProtein Function Predicted Predicted PredictedAC Number Function Function Function
(z≥3) (z≥2) (z≥1)Q7VMZ1 GO:0000166 GO:0000166 GO:0000166 GO:0000166
GO:0005524 GO:0005524 GO:0005524 GO:0005524GO:0016887 GO:0016887 GO:0016887 GO:0016887GO:0017111 GO:0017111 GO:0017111 GO:0017111
O32184 GO:0003824GO:0005488GO:0016491
Q2Y7W6 GO:0000156 GO:0004871 GO:0004871 GO:0000155GO:0004871
Q33CH5 GO:0003723 GO:0003723 GO:0003723GO:0003968 GO:0003968 GO:0003968
Q30U32 GO:0000156 GO:0004871 GO:0000155GO:0004871
O93828 GO:0004585 GO:0016597GO:0016597 GO:0016740GO:0016740 GO:0016743GO:0016743
Q30SN9 GO:0003824GO:0004252
Q2YTY7 GO:0003723GO:0009982
O78911 GO:0008137 GO:0008137 GO:0008137 GO:0008137GO:0016491 GO:0016491 GO:0016491 GO:0016491
Table 5.6. Performance in Protein Function Prediction using the Top-500 ProteinsMethod Parameter TP FP FN Recall Precision F measure
Union z≥3 1349 186 317 0.81 0.88 0.84z≥2 1353 1302 313 0.81 0.51 0.63z≥1 1464 1950 202 0.88 0.43 0.58
Intersection z≥3 162 522 1504 0.1 0.24 0.14z≥2 1269 3891 397 0.76 0.25 0.37z≥1 1345 6857 321 0.81 0.16 0.27
CHAPTER 5. CIRCULAR PATTERN MATCHING 113
4). There are 40807 protein sequences in ProDom database which is also in Pfam database.There are 12104 families in Pfam database. In our experiment, Proteins in ProDom that do nothave corresponding families in Pfam were not included in this analysis. Similar to Chapter 4,for function prediction based on the circular pattern networks, we use the protein functions asmaintained in the GO database.
To ensure diversity, and reduce the problem of redundancy in the protein families, eachfamily is represented by only two member proteins. We select two proteins from each familyin the Pfam database to construct a new network. For each family, we select the respectiveproteins with the maximum and minimum number of circular pattern relationships. These oftencorrespond to the longest and shortest protein sequences in the family. Some proteins belong tomultiple families, thus some proteins will be chosen several times. We only keep one of themon our network. This network presents non-redundant relationships with all protein families.There are 3659 proteins and 4725 families in this network. The resulting data set contained3,659 proteins from 4,725 families.
Network Formation
We construct two types of network. One is based on the circular permutation relationship be-tween proteins. We call this the “Protein” network. The other is based on the families. Anycircular relationships between two proteins will be circular permutation relationship betweenthe two families to which the two proteins belong. We call this the “Family” network. Further,we construct three networks for each of the two networks. The first is a network using onlynon-circular patterns (non-CPs) found between the proteins. This is constructed based on non-circular matching relationships between proteins. The second network is the circular patternnetwork. This network is constructed using only the circular matching relationships (excludingnon-circular matches) between the multidomain proteins. The last network is the combinednetwork which is constructed from all matching relationships (including both circular and non-circular matches). Thus, we construct six networks from our data. Table 5.7 shows the statisticsof these six networks. The networks are shown in Figures 5.8-5.13.
CHAPTER 5. CIRCULAR PATTERN MATCHING 114
Table 5.7. Network statistics for multidomain protein networksParameters Protein Protein Protein Family Family Family
non-circular circular combined non-circular circular combinednetwork network network network network network
Clustering Coefficient 0.157 0.102 0.174 0.181 0.084 0.249Connected Components 756 407 786 600 369 608Network Diameter 7 9 9 13 15 11Shortest paths 23515 17104 25593 103683 58164 123380Characteristic Path Length 2.079 2.099 2.083 3.525 3.814 3.294Average number of neighbors 2.612 2.496 2.723 5.325 4.073 6.844Number of nodes 3458 2299 3659 4416 3140 4725Number of edge 4517 2869 7386 13031 6741 19772
Significance of Circular Pattern Networks
The networks in Figures 5.4.3 and 5.11 have two colors for the network nodes. The red nodesare the nodes in the non-circular pattern networks. The green nodes are the nodes found only inthe circular pattern networks. We notice there are very few green nodes in the networks. Thereare also two colors for the edges. The blue edges indicate edges that occur in the non-circularnetworks. The pink edges indicate the edges that only occurred in the circular networks. Fromthese two figures, we see that the pink edges show more significant differences between thenon-circular pattern network and the circular pattern network.
For example, in the combined Protein network (with CPs and non-CPs), we can observethe pink edges between between some major clusters. This shows an important relationship be-tween these clusters that are only exposed using the circular permutations. For function predic-tion work based on this network, one can expect that these edges will provide more informationabout the clusters, potentially leading to improved prediction results.
In the family networks, there are some interesting observations between the combined net-work and the non-circular network. The part labeled ”G” in the combined network (Figure5.4.3) is connected to the main component, but in the non-circular pattern network, the part Gis a large component which is disconnected from the main component. This indicates that thereexist some important edges that only occur red in the circular network. Such relationships cannot be found using direct pattern matching, and require methods for circular pattern matching,as proposed in this chapter.
CHAPTER 5. CIRCULAR PATTERN MATCHING 115
Table 5.8 shows the Top 25 proteins that exhibited the highest difference (in terms of nodedegree) between protein circular-pattern network and non-circular pattern network. Perhaps, thesignificance of the CP network is clearer in this table, which shows the quantitative differencesbetween selected nodes in the two network. We can observe that, in some cases, the more thanhalf of the associations to some nodes in the Protein networks are due mainly to the circularpattern relationships. We can also see that some small networks are formed only exclusivelybased on circular patterns, with no associations using direct patterns (i.e non-CPs).
Table 5.9 shows the proteins found on the longest path in the Protein network. Giventhe nature of the network, it means that these multidomain proteins must have some commoncircular patterns (and perhaps some non-circular) shared between them. Table 5.10 shows thecorresponding longest chain of families in the Family network.
5.5 Summary
In this chapter, we present four ACPM algorithms to solve the ACPM2 problem and onealgorithm to solve the ECPM problem. ECPM matching algorithm is not only the best algo-rithm in theory, but also it is the fastest in practice. The ACPM-BIDIRECTIONAL algorithmis the best of our ACPM algorithms. Comparing with other algorithms in literature, ACPM-BIDIRECTIONAL algorithm also has the best time complexity on average.
Based on the results, we analyzed circular permutations in multidomain proteins and usedthis to perform protein function prediction. Our results show a performance of 0.88, 0.81 inprecision and recall respectively, at z≥ 3.0, using the union of the functions inter In-edge andOut-edge proteins.
CHAPTER 5. CIRCULAR PATTERN MATCHING 116
Table 5.8. Top 25 proteins with the highest node degree differences between protein net-
works using the circular and non-circular patterns.Protein Degree in Degree in Degree in Degree in circular network
Degree in combined network (%)circular network combined network non-circular network
P03304 266 704 438 37.78%P19525 221 592 371 37.33%P35409 199 567 368 35.10%Q00962 175 470 295 37.23%P17546 175 464 289 37.72%P42684 122 390 268 31.28%P43699 147 402 255 36.57%P48633 174 403 229 43.18%P29476 123 348 225 35.34%Q04610 122 320 198 38.13%Q99323 113 306 193 36.93%P22009 95 276 181 34.42%Q06889 94 260 166 36.15%Q03351 431 588 157 73.30%P10272 82 238 156 34.45%P03336 83 225 142 36.89%P16112 101 240 139 42.08%P38939 77 216 139 35.65%P41381 81 203 122 39.90%P19560 59 175 116 33.71%P30963 68 176 108 38.64%P26762 64 168 104 38.10%P27742 72 169 97 42.60%P19559 62 150 88 41.33%P08049 65 150 85 43.33%
CHAPTER 5. CIRCULAR PATTERN MATCHING 117
Table 5.9. The longest path in the Protein networkorder Protein Family
1 P25930 Pfam-B 80891 P25930 Pfam-B 80902 P34540 Pfam-B 23863 P43141 Pfam-B 28234 Q03351 ig4 Q03351 Pfam-B 117534 Q03351 Pfam-B 37064 Q03351 Pfam-B 37075 P29276 Pfam-B 108076 P13677 Pfam-B 1356 P13677 Pfam-B 7357 P11461 Pfam-B 89108 P03967 ras9 P31133 Pfam-B 3874
10 P46870 Pfam-B 21010 P46870 Pfam-B 22910 P46870 Pfam-B 46110 P46870 Pfam-B 54710 P46870 Pfam-B 54810 P46870 Pfam-B 61611 P13068 Pfam-B 538312 P42686 Pfam-B 700812 P42686 Pfam-B 700913 P44768 Pfam-B 1043314 P32745 Pfam-B 491914 P32745 Pfam-B 492015 P23678 Pfam-B 725215 P23678 PH16 P25892 Pfam-B 381617 P31134 Pfam-B 183218 P35409 7tm 118 P35409 Pfam-B 395218 P35409 Pfam-B 72519 P21838 Pfam-B 809520 P07199 Pfam-B 606221 P09803 Pfam-B 448221 P09803 Pfam-B 614622 P24710 Pfam-B 162022 P24710 Pfam-B 1621
CHAPTER 5. CIRCULAR PATTERN MATCHING 118
Table 5.10. The longest path in the Family networkOrder Family
1 Pfam-B 65442 Pfam-B 111603 ins4 vwc5 Pfam-B 51156 Pfam-B 16777 Pfam-B 46868 Pfam-B 74819 Pfam-B 1063
10 Pfam-B 268211 Pfam-B 319712 Pfam-B 897413 Pfam-B 60414 Pfam-B 81815 adh zinc16 Pfam-B 1115917 Pfam-B 84218 Pfam-B 2068
i $
ppi$ ssi
ssi
ssippi$ ppi$
ppi$
missississippi$
p
i$ pi$
s
si
i
ssi
ssi
ppi$
ppi$ ssippi$
ppi$
ssippi$
ppi$ 8
7
5
4
3
2
14
13
12
11 10
9 6
$
N 1
N 9
N 8
N 7
N 6
N 5
N 4
N 3
N 2
1
0
root
Figure 5.1. Suffix tree for the string T = missississippi$ with some suffix links.(This figure is the same as Figure 3.1 with some suffix link.)
CHAPTER 5. CIRCULAR PATTERN MATCHING 119
Algorithm 5.1: ECPM Algorithm
ECPM(ST,Pattern)1 PP←Pattern ◦ Pattern[1...m-1]2 Current←ST.root,top←1,len←13 for ( i← 1 to 2m-1) do4 if (len = 1) and Current.child.label[1]=PP[i] then5 Current← Current.child6 end if7 if Current.child.label[1]=PP[i] then8 len← len + 19 if len > label.length then10 len← 111 end if12 if i-top=m-1 then13 output ”(matched, Current)”14 Current← Current.SuffixLink, top← top + 115 end if16 else17 Current← Current.SuffixLink18 top← top + 1, i← i - 119 end if20 if i < top or top > m then21 break22 end if23 end for
Algorithm 5.2: All vs. All ECPM Algorithm
ALL-VS-ALL-ECPM(SeqDB,Z)1 ST←Get Suffix Tree(SeqDatabase)2 for ( i← 1 to Z) do3 Pattern←SEQDB[i]4 ECPM(ST,Pattern)5 end for
CHAPTER 5. CIRCULAR PATTERN MATCHING 120
Algorithm 5.3: Pattern Matching Using LIS
APM-VIA-LIS(T,P,k)1 Build the mapping table mapTable which stores the positions in P of each symbol in decreasing order2 seq← NULL3 for ( i← 1 to n) do4 seq← seq ◦ mapTable[T [i]]5 end for6 Generate LIS from seq7 Calculate LCS between T and P from LIS8 if verify(LCS,k) is true then9 return matched10 else11 return mismatched8 end if
Algorithm 5.4: ACPM2 with Greedy Algorithm
ACPM-GREEDY(SeqDB,Z,k)1 for ( i← 1 to Z) do2 for ( j← 1 to Z) do3 P← SeqDB[i], m← |P|, PP← P[1...m]◦P[1...m−1]4 T ← SeqDB[ j], n← |T |5 APM-via-LIS(T,PP,k)6 end for7 end for
CHAPTER 5. CIRCULAR PATTERN MATCHING 121
Algorithm 5.5: ACPM2 Algorithm with LIS
ACPM-LIS(SeqDB,Z,k)1 for ( i← 1 to Z) do2 for ( j← i+1 to Z) do3 P← SeqDB[i], m← |P|4 T ← SeqDB[ j], n← |T |5 for ( v← 1 to n−m+ k) do6 subT ← T [v...v+m+ k]7 for ( h← 1 to m) do8 APM-via-LIS(subT, f h(P),k)9 end for10 end for11 end for12 end for
Figure 5.2. The number of hypotheses with q-gram
CHAPTER 5. CIRCULAR PATTERN MATCHING 122
Algorithm 5.6: ACPM2 Algorithm with q-gram and Suffix Array
ACPM-QGRAM(SeqDB,N,Z,q,k)1 seq← NULL, pos← NULL, s← 12 for ( i← 1 to Z) do3 seq← seq ◦ SeqDB[i]4 for ( j← 1 to mi) do5 pos[s]← i, s← s + 16 end for7 end for8 <SA,lcp>← BuildSA(seq)9 for ( i← 1 to N) do10 Candidates← {}11 do while (lcp[i] ≥ q )12 Candidates← Candidates ∪ {i}, i← i+113 end do14 for each Pair {x,y} ∈ Candidates do15 P← SeqDB[pos[SA[x]]], m← |P|16 T ← SeqDB[pos[SA[y]]], n← |T |17 for ( j← min(1,y−m− k +q) to y+m+ k−q) do18 subT ← T [ j... j +m+ k−1]19 for ( h← 1 to m) do20 APM-via-LIS(subT, f h(P),k)21 end for22 end for23 end for24 end for
m
0
j
i n
Figure 5.3. Dynamic Programing in q-gram matching
CHAPTER 5. CIRCULAR PATTERN MATCHING 123
P2 P1
1
23
Text QT
QPP[j+q...m] P[1...j-1] P[j+q...m] P[1...j-1]
Figure 5.4. Three cases in computing the circular edit distance in the ACPM algorithm using
the bidirectional edit distance. The numbered double-header show the symbol positions
involved in each case.
Algorithm 5.7: q-gram ACPM Hypothesis Generation
ACPM-BIDIRECTIONAL(SeqDB[],k,N)1 ST← Build-Suffix-Tree(SeqDB)2 for i=1 to N3 P← SeqDB[i]4 m← |P|5 q← b m
k+1c6 start← 17 {< SeqDB ID,position>} ← Search-in-ST(ST,P[1...q])8 for each pair < SeqDB ID,position>
9 T ← SeqDB[SeqDB ID]10 BIDIRECTIONALED(P,1,T ,position,k,q)11 end for12 start← start + 113 do while (start ≤ m-p)14 {< SeqDB ID,position>} ←
Search-in-SuffixLink(ST,P[start...start+q-1]) via suffix link15 for each pair < SeqDB ID,position>
16 T ← SeqDB[SeqDB ID]17 BIDIRECTIONALED(P,1,T, position,k,q)18 end for19 start← start + 120 end do21 end for
CHAPTER 5. CIRCULAR PATTERN MATCHING 124
Algorithm 5.8: ACPM Hypothesis Verification
BIDIRECTIONALED(P, posP,T, posT,k,q)1 m← |P|2 P1← P[posP+q...m] ◦ P[1...posP-1]3 P2← P1R
4 ed1← DP(P1,T[posT+q...posT+q+m+k-1],k); ed2← DP(P2,T[posT-m+q...posT-1]R,k)5 ED← ComputeED(ED1,ED2,m,q)6 for h=1 to m-q+17 if (ED[h] + ED[h+m-1] ≤ k) then do8 return match9 end for
0 50000 100000 150000
010
020
030
0
Number of protein
Tim
e(M
inut
es)
● ● ●●
●
●
●
●
GreedyACPMACPMLISACPMq−gram(q=2)ACPMq−gram(q=1)
0e+00 2e+05 4e+05 6e+05 8e+05 1e+06
−2
02
46
8
Number of protein
log(
Tim
e(M
inut
es))
●
●
●
●
●
●
●
●
●
●
GreedyACPMACPMLISACPMq−gram(q=2)ACPMq−gram(q=1)
Figure 5.5. The time cost of the CPM algorithms
CHAPTER 5. CIRCULAR PATTERN MATCHING 125
(a) Degree distribution (b) Degree distribution in Top-100 degree nodes
(c) Log degree distribution (d) Log degree distribution in Top-100 degree nodes
Figure 5.6. Degree distributions in the network of multidomain proteins constructed based
on the circular patterns they contain.
CHAPTER 5. CIRCULAR PATTERN MATCHING 126
(a) (b)
Figure 5.7. Number of directly connected pairs in Top-K highest degree proteins
CHAPTER 5. CIRCULAR PATTERN MATCHING 127
Figure 5.8. The Protein network (using both CPs and non-CPs)
Red nodes are nodes found in the non-CP network. Green nodes are nodes found only in theCP-network. Blue edges denote edges found in the CP network. Pink edges are edges foundonly in the CP network.
CHAPTER 5. CIRCULAR PATTERN MATCHING 128
P 1 9 7 1 6
P 1 1 2 3 6
P 4 1 3 6 0P 0 1 1 4 2
P 2 4 1 2 8
P 0 1 2 8 3
P 1 3 1 2 1
P 2 0 5 0 4
P 1 1 7 0 5P 1 6 0 2 5
P 4 2 4 8 6
P 4 8 1 2 0
P 1 7 5 4 6P 4 8 7 5 6
P 1 8 6 8 0P 3 7 8 7 1
P 1 3 1 2 2
P 0 9 2 5 9
P 4 7 5 8 2
P 2 2 7 0 4
P 1 2 0 9 3
P 0 7 3 9 2
P 3 2 5 9 5
P 4 3 7 3 9
P 0 3 6 8 0
P 2 8 3 4 0
P 2 2 1 3 9
P 2 1 4 0 2
P 1 7 1 9 2
P 1 3 8 4 6
P 2 8 3 3 9P 0 7 9 1 7
P 0 5 6 6 4
P 4 1 5 5 6
P 1 9 8 1 1
P 3 4 7 7 8
Q 0 3 5 8 6
P 1 3 9 1 1 P 4 8 3 3 7
P 3 0 3 2 0
P 2 8 8 5 7 P 1 7 3 9 3
P 3 0 3 1 8
P 3 1 8 1 3
P 0 7 1 6 7
P 4 7 8 7 2
Q 0 8 3 4 1
P 0 1 1 4 3
P 2 3 8 1 1
P 4 3 7 9 9
Q 0 6 1 4 5
P 1 8 5 4 0
P 0 2 7 2 4
P 2 1 8 5 0
P 2 7 1 0 6
P 1 5 1 7 3
P 1 4 0 0 3
P 2 3 9 9 9
P 1 3 0 9 7
P 1 3 0 8 8
P 1 3 0 8 9
P 3 3 1 4 4
P 0 6 2 9 5
P 4 6 5 9 2P 2 4 7 9 3
P 1 7 6 6 7
P 1 5 1 7 2
P 3 5 2 0 8
P 1 7 9 2 0
P 1 3 9 0 3 Q 0 0 9 4 2
P 4 8 3 7 1
P 4 5 0 4 7
P 4 8 3 7 2
P 3 0 1 9 0
P 1 6 8 5 0
P 0 7 0 6 5
P 1 5 3 4 8
P 0 1 0 8 5
P 3 4 2 0 3
P 3 6 4 2 9
P 2 5 3 4 5
P 1 0 7 2 3
P 3 8 7 0 7
P 4 3 8 2 9
P 4 7 3 8 2
P 4 5 0 4 6
P 4 7 3 5 9
P 3 5 7 1 0
Q 0 6 8 3 1P 4 1 2 5 5
Q 1 0 1 1 5
P 4 7 7 9 2
P 4 8 4 3 2
P 4 8 4 3 6
P 4 1 2 2 5
P 4 8 4 3 3
P 0 7 6 6 6
Q 0 6 9 4 5
P 4 9 5 9 6
Q 0 9 1 7 3
P 2 1 3 2 8
P 4 9 5 9 8
Q 0 9 1 7 2
P 4 9 4 4 4P 3 4 2 2 1
P 3 9 9 6 6
P 3 5 4 2 8
P 4 1 1 3 4
P 2 0 0 6 7
P 2 2 8 1 6
P 4 7 9 2 8
Q 0 2 3 6 3
P 0 7 8 5 1
P 4 0 8 7 5
Q 0 5 0 6 6
P 4 8 0 4 6
Q 0 7 9 6 5
P 1 7 6 0 0
P 4 3 8 2 5
P 2 2 8 3 1
P 1 5 1 7 8
P 1 4 8 6 8
P 3 6 4 1 9
P 0 6 6 2 2
P 0 7 8 6 1
P 4 3 8 3 3P 4 9 5 9 1
P 0 1 1 0 6
P 4 4 7 1 5
P 1 2 7 3 3
P 0 1 3 4 7
Q 0 0 4 2 0
Q 0 4 9 6 0
P 1 0 0 3 1P 0 6 2 3 6
P 3 8 9 3 9
P 2 1 6 2 0
P 0 0 9 6 7
Q 0 2 3 9 4
P 0 9 7 4 5
P 1 6 3 9 3
P 1 3 2 1 3
P 2 8 6 6 1
P 4 2 2 0 9 P 3 5 2 2 1
P 3 7 4 6 4
P 3 2 4 4 0P 4 6 1 8 2
P 4 4 3 5 4
P 4 2 1 8 8
P 4 4 3 4 9
P 4 5 7 6 7
P 3 5 9 9 2
P 1 7 5 6 1 P 2 7 5 2 6
P 4 7 7 6 0
P 0 3 2 9 6
P 2 4 3 5 8Q 0 9 8 8 2
P 2 5 6 8 9
P 1 5 6 4 4
P 3 7 8 8 7
P 2 2 8 8 6
P 2 2 8 8 1
P 2 4 6 9 5
P 3 2 7 8 2
P 3 7 7 0 3
P 0 1 2 1 1
P 4 2 2 2 0 P 3 0 5 7 3P 2 2 4 3 4
P 3 2 2 3 2
P 4 2 1 8 6
P 4 2 1 8 7
Q 0 2 4 7 3
P 4 9 3 8 6
P 2 7 7 5 4
P 1 4 3 0 8
P 2 1 3 0 2
P 4 4 7 2 1
P 1 6 1 2 2
P 0 7 3 7 4P 2 3 7 0 1
P 1 8 5 0 9
P 2 1 6 7 0
P 0 9 4 8 9P 1 8 4 8 5
P 4 8 1 4 4 P 1 9 9 5 4P 3 2 6 7 0
Q 0 3 2 8 3
Q 0 7 9 6 4
P 2 7 5 7 0
P 0 1 0 9 3
P 2 9 6 1 7
P 1 0 0 8 5
P 0 7 2 0 4
P 0 9 4 4 1
P 4 6 5 1 8
P 1 2 3 4 9
P 1 2 3 4 8
P 1 4 2 8 3
P 2 6 6 8 7
Q 0 6 2 3 4
P 0 9 9 3 3
Q 0 8 3 4 5
P 9 8 0 9 2
P 2 3 0 2 5
P 4 5 3 5 8
P 3 2 7 7 0
P 4 2 5 3 0
Q 0 8 4 0 0
P 0 0 4 5 1
P 0 0 5 3 3
P 1 5 3 0 6
Q 0 1 1 5 8
P 4 3 8 6 9
P 0 3 7 0 1P 1 4 5 4 3
P 9 8 0 9 5
P 2 5 1 5 5
P 1 0 4 9 3
P 4 6 5 1 9
Q 0 1 2 7 9
Q 0 6 3 9 3
P 4 2 2 3 1
P 0 2 8 0 9
P 0 2 8 0 8
P 1 4 6 3 9
P 4 3 6 5 2
P 4 2 2 2 7
P 4 4 4 2 3
Q 0 0 2 8 8
P 0 9 4 9 8
P 1 1 3 6 8
P 1 5 4 0 2
P 2 3 7 5 9
P 1 7 1 7 7
P 4 0 5 7 3
P 2 9 5 7 8
Q 0 0 4 9 6
P 1 7 8 5 4
P 0 0 3 0 9
P 1 8 5 4 8
P 1 7 8 4 6
P 3 5 5 2 5
P 4 0 3 3 9
P 3 4 4 2 9
P 3 7 8 8 9
Q 0 8 8 7 9
P 2 9 5 7 9
P 3 2 1 5 5
P 4 7 1 9 0
P 2 7 1 7 0
P 3 1 3 8 2
P 2 7 1 6 9
P 4 6 0 8 1
P 0 4 9 5 8
P 3 7 3 4 2 P 1 2 6 8 8 P 1 8 9 6 1 Q 0 2 4 2 6 P 2 7 2 7 6 P 0 7 2 1 0 P 4 4 6 2 4 Q 0 7 0 7 5 P 3 3 7 3 3Q 0 2 1 8 7P 0 8 5 3 8 Q 0 4 6 0 9P 0 6 8 1 1P 2 7 8 9 6P 2 9 7 1 6 Q 0 1 6 7 9 P 2 3 8 4 3P 1 6 0 9 9P 0 5 6 5 4P 4 1 0 0 8Q 0 7 2 8 2 P 2 5 9 4 1P 2 5 0 4 7 P 1 2 6 9 4 P 1 1 7 2 5 P 1 2 9 5 4 Q 0 9 6 7 1 Q 0 9 6 7 0 P 3 8 0 6 9 P 4 9 6 3 8 P 4 9 1 1 2P 1 1 4 7 8P 1 1 1 7 8P 3 3 1 8 2P 2 9 2 4 7P 4 2 5 9 3
P 3 9 9 5 4 P 3 9 5 1 8 P 0 9 1 8 1 P 0 5 1 6 4P 0 8 9 3 4 P 0 1 0 4 8P 1 1 8 1 9 P 1 0 4 1 5Q 0 0 7 0 9P 1 9 7 7 4P 3 7 2 4 8P 0 9 4 7 0P 3 7 1 3 6P 2 1 8 9 9P 1 1 2 4 7 P 1 5 1 0 9P 0 9 1 8 0 P 0 0 4 3 6 Q 0 3 2 1 7 P 4 6 0 1 1 P 4 2 6 7 5 P 4 2 6 7 6 P 2 0 8 1 0 P 2 7 3 2 1 P 0 4 3 2 4 P 2 3 8 9 3P 1 7 3 5 1 P 4 2 1 7 9P 0 3 4 7 2P 0 5 8 5 7 P 0 3 4 8 3 P 3 0 6 8 7P 1 5 9 2 1
P 4 9 2 3 7 P 4 3 9 1 2 P 4 5 1 3 0 P 1 6 4 0 7 P 1 9 9 0 3 P 0 2 8 4 4P 2 6 6 9 9 P 0 6 7 2 3P 0 3 7 3 9 P 1 1 0 7 7
P 2 0 5 8 5P 1 4 7 1 5P 4 9 2 9 3P 0 6 8 9 5 P 1 0 9 6 5
Q 0 1 5 1 8P 3 7 1 4 1P 2 7 6 0 8P 2 9 9 7 6P 4 5 2 2 1P 2 3 0 5 4P 0 2 9 8 2P 3 2 9 3 5P 3 2 9 3 3
P 2 3 1 0 1P 2 5 7 2 3P 4 5 2 9 7P 4 2 7 9 9P 4 6 2 8 8
P 3 8 5 1 1 P 4 6 5 4 8 P 4 9 1 5 1 P 2 0 9 1 8P 1 4 7 4 3 P 1 9 2 5 7P 1 8 7 5 9
P 3 7 0 0 2 P 2 0 6 9 8 P 0 9 9 1 8 P 2 3 9 9 8 P 1 0 3 6 8 P 3 1 7 8 3 P 4 1 6 8 8 P 4 0 2 6 6
P 2 3 4 4 6
P 2 5 3 5 8
Q 1 0 0 0 2
P 9 8 1 3 6P 1 0 4 6 3 P 4 1 1 5 0
P 4 8 5 7 2
P 1 7 9 1 5
P 0 0 4 5 0
P 2 2 2 5 3
P 3 6 4 1 6
P 1 3 2 1 7 P 1 4 2 6 2 P 0 8 9 5 4 P 3 6 6 2 3 P 3 7 5 9 0P 1 0 6 8 8
P 1 1 0 2 4
P 1 8 5 8 1
P 4 3 4 5 2P 1 2 7 9 3
P 1 9 7 1 1P 4 5 6 7 7
P 4 8 9 8 0
P 4 5 6 7 8P 2 2 9 9 0
P 4 5 5 8 2
P 4 4 9 5 7
P 9 8 0 8 5
P 4 1 0 6 4P 0 9 2 7 4
P 4 2 1 2 6P 4 3 1 5 4 P 4 2 1 7 5
P 4 3 0 1 0
P 0 0 9 5 8
P 1 0 1 7 2P 2 1 5 3 0
P 2 9 0 1 6 P 1 5 8 1 2
Q 0 1 8 2 6 P 4 8 7 6 9P 4 3 3 4 6P 4 3 0 0 7
P 0 3 5 1 5P 1 2 0 4 7P 1 2 0 4 6
P 4 8 6 2 0
P 1 7 9 0 4 P 3 0 1 8 1 P 4 4 9 2 8 P 0 6 8 3 9 P 4 3 8 5 3 P 0 0 4 9 9P 2 8 3 4 8
P 3 3 2 1 7
P 1 5 7 2 2
P 1 3 8 3 9
P 3 5 6 8 9
P 1 2 7 7 7
P 1 7 5 9 5P 4 1 3 2 6
P 3 3 0 7 1 P 4 5 1 0 2
P 3 4 7 3 2
P 0 1 1 3 4P 0 1 1 3 5
P 2 9 1 5 5
P 0 9 9 1 6
Q 0 3 4 9 9P 1 7 4 4 9
P 0 7 0 6 1P 1 5 5 5 3 Q 0 4 8 7 0
P 4 3 5 4 5
Q 0 3 4 6 7
P 3 9 3 0 5
P 4 1 8 9 5
P 3 7 5 2 7
P 3 7 4 7 4
P 1 6 6 1 2
P 3 0 9 5 8
P 2 0 6 6 5
P 2 0 6 9 6
Q 0 4 6 1 2
P 0 7 0 6 2
P 2 5 2 2 1
P 0 4 6 3 7
P 3 5 4 5 6
P 2 1 5 7 7
P 3 1 0 9 8P 0 6 6 1 5P 0 3 9 5 8P 3 7 4 5 5 P 3 1 0 9 6P 1 6 9 2 2P 3 4 6 0 1
P 1 0 3 6 0 P 2 9 7 6 9
P 3 7 5 0 4
P 4 9 6 1 6 P 2 1 1 1 5
P 0 6 6 7 5
P 4 6 4 6 3
P 0 4 7 0 1P 3 7 5 0 0P 2 2 4 4 0
P 2 4 0 0 4
P 3 5 1 5 9
P 2 1 0 7 7
P 3 6 8 7 2
P 4 9 4 6 6
P 0 6 8 6 7 Q 0 0 0 0 5
P 0 8 7 2 1 P 3 6 2 6 0 P 2 0 0 2 4P 1 4 7 2 9 P 3 6 2 6 6 P 2 7 3 0 3
P 4 0 1 0 5
P 0 9 7 8 1
P 0 5 8 7 6
P 3 5 4 2 5
P 1 5 6 8 7
P 2 7 3 3 6
Q 0 2 1 2 3
P 2 7 9 8 1
P 1 8 8 6 9P 2 0 8 2 5
P 3 0 5 9 4P 3 9 0 5 8
P 1 0 6 1 4 P 3 6 5 9 9
P 1 0 4 2 4
P 0 0 8 0 7
P 2 9 6 9 6
Q 0 3 0 4 6
P 3 0 5 7 2
P 2 2 5 0 6
P 4 0 4 0 6
P 3 3 8 1 8
P 0 9 2 8 6P 0 3 3 5 4
P 2 0 6 4 2
P 2 7 0 3 4 P 2 0 6 4 3
P 4 6 2 4 0
P 0 7 5 4 7 Q 0 6 7 5 8P 0 9 9 7 6
P 1 1 6 3 5 P 0 8 9 7 3
P 0 8 6 5 1
P 0 2 9 6 4P 0 9 9 7 5
P 2 5 9 8 0
P 1 0 9 5 5
P 4 6 3 1 4
P 2 6 7 6 2
P 1 7 8 6 9
P 3 2 0 0 1
P 3 5 4 4 1
P 0 5 3 5 7
Q 0 8 8 7 5
P 3 3 0 8 7
P 4 3 5 3 8
P 2 4 0 8 8
P 3 8 9 0 0
P 4 0 8 8 9
P 2 2 1 0 2
P 1 2 6 8 4
P 1 2 0 4 0
P 2 1 8 7 2
P 1 0 8 4 5
P 0 4 0 3 5
P 3 1 0 2 0P 2 6 3 7 9
P 4 7 3 1 5
P 4 3 9 2 2
P 3 3 7 7 5P 1 6 0 4 6
P 0 4 0 2 5
P 1 4 0 0 2
P 0 7 3 7 5
P 0 3 3 6 3
P 4 7 1 9 8
P 3 5 2 5 8
P 2 8 7 1 5
P 1 6 0 4 3
Q 0 6 4 5 8
P 3 2 2 2 8
P 4 6 8 6 2
P 3 7 8 4 0
P 3 7 3 7 9
P 3 7 3 7 7
P 4 3 9 2 0
P 2 8 5 9 4
P 1 8 3 1 0P 3 6 6 1 7
P 1 6 0 9 7
P 3 6 8 3 7
P 1 2 8 8 0
P 3 8 0 3 3
P 4 6 0 5 9
P 0 6 1 7 9
P 0 0 8 6 4
P 2 7 9 6 7
P 3 9 8 6 8
Q 0 3 4 0 0P 4 6 0 6 7
P 3 6 2 4 4
P 4 4 9 9 0
P 1 7 1 4 4
P 2 5 4 5 6
P 1 7 1 4 3
P 3 4 0 4 7
P 2 5 0 9 6 P 2 9 9 1 6
P 2 4 0 8 3
Q 0 5 8 1 5
P 3 3 8 1 9
P 1 2 7 4 7P 0 5 5 4 7
P 1 0 0 3 3
P 0 5 8 3 5
P 3 2 2 2 9
P 1 6 6 0 5
Q 0 2 2 1 9
P 0 8 2 0 1
P 2 5 1 0 4
P 4 5 3 4 0
P 1 7 3 7 1P 0 3 5 1 6
P 2 9 0 4 4
P 0 9 3 9 6
P 3 1 6 3 9
P 4 9 1 7 0
P 2 1 0 1 9
P 1 4 2 8 6
P 4 9 1 6 9
P 3 3 8 7 1
P 3 3 3 9 6
P 0 6 8 2 5
P 3 0 2 6 8
P 2 3 5 9 6
P 1 5 3 6 9
P 1 7 4 2 4 P 2 2 8 0 5
P 0 4 8 4 0
P 4 7 4 7 1
P 0 3 5 5 2
P 4 3 3 0 4
P 3 7 9 6 5
P 1 6 7 0 6
P 1 6 3 1 6
P 1 3 0 3 5
P 4 6 6 0 5
P 0 3 5 5 1P 4 3 4 7 1
P 3 7 0 7 5
P 1 5 4 2 3
P 1 3 3 9 4
P 2 7 4 0 5
P 4 6 9 7 3
P 1 2 2 0 4
P 1 3 2 2 6
P 2 5 0 9 0
P 1 9 2 3 7
P 1 0 5 1 2
P 1 6 0 2 3
P 3 0 7 6 0
P 3 6 5 8 1
P 4 7 5 8 3
P 3 3 5 1 4
P 1 4 0 3 7
Q 0 2 1 9 0
P 1 6 0 7 3P 3 5 3 9 8
Q 0 1 1 2 9
P 1 8 1 2 6
P 3 5 8 5 2
P 3 3 8 5 8
P 1 6 9 1 7
Q 0 1 8 8 0
P 4 8 9 9 3
P 1 6 5 3 0P 4 0 9 7 3
P 1 3 2 0 1
P 0 6 4 7 3
P 2 6 0 1 3
Q 0 5 5 2 6
P 0 8 4 0 8
Q 0 2 2 0 1P 2 6 0 1 2
P 2 5 4 1 5
P 4 3 8 7 9
P 3 5 6 3 3
P 4 6 8 3 1
P 2 8 2 9 8
P 1 7 9 2 3
P 2 1 9 9 9
Q 0 3 3 5 0
P 1 0 5 4 9P 4 6 4 8 3
Q 0 6 4 4 1
Q 0 5 8 9 5
P 4 4 3 3 3
P 3 5 4 4 3P 0 8 9 7 0
P 4 6 8 1 3
P 1 4 2 2 6
P 3 0 8 7 8
P 0 6 2 8 0
P 1 3 4 6 0
P 0 6 6 6 5
Q 0 2 5 8 1
P 4 3 5 0 2
P 1 5 6 4 5
P 3 1 7 4 3
P 1 1 7 1 1
P 2 2 9 4 5
P 4 9 1 0 2
P 2 7 7 8 3
P 0 2 9 3 6
P 0 3 5 4 4
P 4 4 4 8 7
P 3 1 4 6 0
P 3 1 4 7 5
P 1 8 3 9 5
P 1 8 1 8 4
P 2 6 6 8 3
Q 0 3 0 6 5
P 0 7 8 6 0
P 1 7 5 2 1
P 3 6 5 7 4
P 3 9 2 7 6
P 1 0 6 1 2
P 2 0 8 5 2
Q 0 5 7 6 3
P 1 6 1 2 6
P 0 5 7 9 4
P 2 0 7 1 2 Q 0 0 7 6 3
P 2 2 4 7 4P 2 4 1 8 8
Q 0 3 0 4 0
Q 0 0 5 5 6 P 4 2 7 1 2
Q 0 6 2 0 2
P 3 9 1 6 1
P 2 2 3 1 4
P 2 0 9 7 3
P 2 2 5 1 5
P 3 8 8 2 0
Q 0 2 0 5 3P 4 1 2 2 6
Q 0 5 1 1 3
P 0 3 1 0 9
P 0 5 0 5 5
P 3 6 1 0 1
P 4 9 3 0 7
P 4 0 9 7 4 P 1 2 6 2 3 P 1 2 3 5 2
P 3 7 8 9 4
P 0 2 9 5 9
P 1 0 0 4 7
P 1 6 4 9 7
P 1 5 2 7 3P 4 4 7 4 4P 3 9 3 3 7P 2 0 6 0 8P 0 3 5 3 9P 3 1 0 6 4P 0 3 8 2 8Q 0 2 1 8 9 P 4 7 1 9 4 P 1 3 7 0 1 Q 0 8 0 2 1
P 3 2 1 1 2P 0 5 8 2 5Q 1 0 1 3 4P 0 9 0 3 0P 3 5 9 3 1 P 2 7 8 6 4 P 4 6 1 1 8
P 4 4 9 4 7 Q 0 1 2 2 2 P 3 3 8 2 4 P 2 8 9 3 9 P 1 0 2 3 8 P 3 7 5 3 7 P 3 3 8 0 3
Q 0 1 2 4 1
P 2 6 6 5 8
Q 0 1 2 4 0
P 2 7 2 0 8P 0 6 5 9 0
P 2 3 5 9 1P 3 8 9 7 9
P 3 2 0 3 2P 1 0 3 6 2
P 2 2 1 7 4
Q 0 4 8 5 3
P 3 8 9 8 4
P 0 5 3 5 1
P 4 4 6 3 2
P 0 9 1 3 1P 2 6 4 3 0 P 1 5 2 9 2
P 1 7 7 8 0 P 1 7 7 9 1
Q 1 0 0 9 9 P 2 0 6 1 6
P 2 2 1 7 3P 2 9 0 4 5
P 1 2 0 4 2 Q 0 2 3 8 5 P 1 8 4 5 0 P 3 3 4 7 2
P 2 1 6 0 7
P 2 0 4 0 2 P 4 9 4 0 9
P 4 6 5 7 9
P 0 1 5 4 5 P 2 4 7 7 0 P 3 2 0 9 7P 2 1 0 0 5
P 2 1 8 1 5P 0 5 0 6 0P 2 3 3 8 9P 4 0 8 3 2
P 2 6 9 8 2P 3 9 1 9 2
P 4 6 2 5 0P 3 1 2 1 4
P 3 9 1 8 9
P 3 1 2 1 3
P 3 2 9 2 1
Q 0 6 4 2 7
Q 0 9 1 7 0P 3 0 7 1 0
P 3 7 2 7 1
P 2 0 9 8 1P 2 9 0 7 7
P 1 1 9 3 4P 3 2 5 5 1
P 1 3 6 2 9
P 4 7 2 3 1
P 2 7 8 9 8 P 0 9 9 5 8
P 1 5 2 8 8 P 4 4 8 1 7
P 1 0 2 9 0 P 1 7 8 8 8P 2 3 3 7 7
P 0 7 6 0 3Q 0 8 4 8 1
P 2 3 0 0 4
P 0 7 2 3 3 P 1 1 0 7 8
P 1 3 6 2 1P 0 3 3 2 2
P 1 6 2 8 4P 4 1 7 1 8
P 2 1 3 7 5
P 4 1 7 0 5P 4 8 7 7 7
P 3 6 0 3 3
Q 0 7 3 0 7
P 0 5 4 3 7P 1 6 9 0 0
P 3 6 2 6 5
P 1 9 1 9 9
P 0 3 5 5 6
P 1 9 5 6 1
P 0 3 3 1 4
P 2 5 0 5 9
P 1 9 9 0 1
P 2 2 0 5 6
P 1 9 0 2 8P 2 1 4 8 0P 1 0 2 7 2
P 0 5 8 4 4
P 2 2 4 9 5
P 0 3 3 0 2
P 0 8 7 6 8
P 1 9 5 6 0
P 3 6 3 3 0
P 3 8 6 7 5
Q 0 4 7 2 6Q 0 0 1 8 4
P 0 8 0 1 2
P 1 9 9 0 7
P 3 2 4 7 9
P 2 4 3 8 4
P 1 7 0 1 0
P 3 4 9 5 6P 0 3 8 7 8
P 2 4 7 9 4
P 0 0 3 9 7P 0 6 0 1 9
Q 0 5 2 1 5
P 2 6 6 3 2
P 1 8 1 4 6
P 2 0 0 1 4
P 0 3 0 0 1
P 1 5 2 7 0
P 3 5 8 8 0
P 0 8 1 5 1
Q 0 1 0 1 4
Q 0 5 1 5 9
P 1 0 0 7 1
P 2 8 1 5 9
P 3 8 4 8 8P 2 5 3 8 7
P 1 1 4 9 0
P 0 8 0 7 8P 4 1 8 3 8P 2 9 3 8 7
Q 0 9 8 9 3
P 3 6 3 3 1 P 1 1 2 0 4P 3 1 8 2 2
P 0 7 5 6 6P 3 6 9 9 9
P 1 3 5 2 9P 0 4 2 8 1
P 2 0 0 2 1
Q 0 2 0 4 0
Q 1 0 0 8 7P 3 9 1 0 2
P 0 5 2 2 2
P 2 0 3 1 0
P 2 3 6 7 8
P 1 1 4 6 1
Q 0 2 4 1 3P 2 8 7 1 3
Q 0 1 9 3 1
P 4 0 1 8 0
P 4 6 2 9 8
Q 0 0 8 9 9
P 3 3 1 5 1
Q 0 0 8 6 1
P 1 6 6 8 3
Q 0 3 5 1 9P 3 9 1 0 9
P 4 9 5 0 1P 4 1 6 4 7
P 2 1 4 3 9
Q 0 0 6 1 9
P 2 2 0 3 6Q 0 3 0 2 5
P 1 2 8 7 8
P 0 3 3 1 9
P 4 6 4 7 1
P 3 3 2 9 9
P 3 4 1 2 3P 1 2 2 3 4P 2 7 4 3 9
P 1 0 8 9 6P 4 5 2 1 9
P 4 7 6 9 5
P 3 2 7 9 5
P 3 2 4 6 8
P 4 1 8 3 6
P 1 7 6 4 1
P 3 3 7 6 0
P 4 4 3 2 5
P 3 3 2 8 9
P 2 3 7 8 7
P 3 3 6 6 1P 1 9 9 5 0
P 1 5 3 7 0P 2 3 5 9 8 P 4 5 0 5 1 P 3 0 2 9 6
P 0 7 1 0 9
P 2 4 6 8 3
P 4 7 4 2 1
P 0 9 8 0 3
P 2 1 8 3 8
P 1 3 0 6 8
P 3 0 3 3 6
P 1 1 5 1 6
P 3 1 2 4 2
P 3 8 9 5 2
P 1 3 4 7 3
P 3 6 4 9 9
P 2 4 5 0 3
P 4 6 8 7 2P 4 5 9 6 2
P 4 6 8 6 4
P 2 8 0 2 5
P 2 8 7 4 3
P 2 8 7 3 9
Q 0 7 9 7 0
P 4 6 0 7 0
P 4 6 6 8 4
P 2 4 1 2 0
P 1 8 9 1 7
P 4 1 8 1 0
P 4 3 0 8 8
P 4 8 9 8 4P 3 7 6 9 3
P 4 7 0 2 5
Q 0 6 1 1 0
P 4 2 8 4 1
P 1 4 7 8 7
P 1 1 0 4 7P 1 7 3 4 3P 3 8 1 2 3P 1 1 1 6 1
P 4 5 7 9 1P 2 1 4 4 1
P 1 1 0 9 2
P 4 5 6 3 7
Q 0 5 5 9 7
P 3 6 0 2 8
Q 0 4 9 8 2P 3 9 5 8 3
P 2 2 9 4 0
P 1 3 5 6 8
P 4 0 9 6 7
P 4 6 4 6 6
P 3 3 2 9 7
P 4 6 5 0 2
Q 0 2 5 9 2
P 1 7 9 8 0P 2 1 4 4 8
P 0 3 3 2 0P 4 6 4 6 5
P 3 8 7 3 5
P 4 4 8 8 7
P 4 4 0 4 7
P 3 3 3 1 0P 3 6 3 7 1
P 3 7 0 2 9
P 1 9 7 7 1
P 3 3 3 1 1
P 4 5 0 1 9
P 4 5 8 6 1
P 4 4 9 1 7
P 2 5 9 9 7P 1 5 1 8 7
P 0 8 2 6 6
P 0 3 9 5 6
Q 0 9 4 2 7
P 4 2 4 3 6
P 3 8 0 4 6
P 0 5 9 5 5
P 3 9 4 5 6 P 2 3 8 1 5P 4 5 6 0 0
P 0 9 8 3 3
P 1 6 6 8 4
P 0 4 1 7 6
P 4 5 1 0 5
P 4 2 0 6 5
P 2 8 2 4 6
P 2 1 8 5 2
P 1 0 9 6 4
P 2 4 1 3 6
P 3 2 7 1 8
P 4 5 1 6 7
P 1 6 3 5 5
P 1 5 3 9 8
Q 0 3 2 0 3
P 4 6 9 2 0
P 1 2 3 8 3
P 1 7 2 5 9P 2 6 3 6 1
P 3 3 3 0 2
P 0 3 5 9 3
P 4 3 0 7 4
P 0 9 0 1 2
P 4 2 3 3 7
P 3 3 2 8 6 P 4 7 8 4 0
P 4 6 3 0 2
P 1 4 7 2 8
P 3 5 0 9 3
P 3 1 0 6 0 P 3 3 2 0 0
P 3 0 6 2 8
P 4 5 0 5 2
P 0 8 0 0 7
P 3 1 7 7 4P 4 5 3 2 1
P 4 5 1 7 0P 3 7 7 5 9
Q 0 3 2 5 2
P 4 0 0 2 4
P 3 0 9 6 3
P 1 5 9 6 2
P 1 0 6 3 6P 1 6 5 2 1
P 0 8 3 6 4
P 0 6 9 3 5
P 2 7 4 1 0
P 0 9 8 1 4
P 3 6 3 0 4
P 2 0 1 2 6
P 0 3 2 0 0
P 1 3 5 6 1
P 3 5 9 2 8P 0 3 5 9 9
P 1 8 2 4 7
P 1 3 8 9 7
P 0 3 3 1 6
P 1 3 9 0 0
P 1 6 6 0 4
Q 0 2 5 9 7
P 0 5 9 5 9
P 1 0 9 7 8
P 2 7 2 8 5
P 2 9 3 2 4
P 0 3 3 0 6
P 3 1 6 3 0
Q 0 4 5 3 8
P 1 7 5 9 3
P 2 7 2 8 2
P 3 6 3 2 7
P 3 6 3 0 9P 0 3 3 0 5
P 0 6 6 5 4
P 0 7 9 4 9P 3 2 1 1 3
P 1 3 9 5 9
P 2 9 3 9 3
P 1 6 6 4 9
P 4 9 1 7 7
Q 0 9 7 1 5
Q 0 1 9 8 1
P 3 9 7 7 0
P 1 7 0 9 7
P 1 5 2 6 9 P 4 7 1 6 4
P 3 3 7 4 9
P 2 5 4 9 0
P 0 7 2 6 8
P 3 3 7 4 8
Q 0 6 8 8 9 P 3 9 8 0 6
Q 0 4 6 1 0P 2 7 4 0 9Q 0 4 5 4 4
P 0 3 3 0 4
Q 0 5 0 5 7
Q 0 0 9 6 2P 0 7 1 0 5
P 2 5 0 4 9P 4 2 0 0 0
P 3 6 1 3 0
P 2 3 4 6 3P 2 5 0 6 6
P 4 8 8 1 0
P 2 5 8 9 2 P 1 2 7 5 7
P 3 5 4 1 8
P 0 1 0 0 8
P 3 4 2 1 6
Q 0 3 3 9 6
P 1 3 6 1 5P 3 2 3 8 0
P 1 7 8 6 3
P 1 7 7 7 1
P 1 5 3 1 0
Q 0 2 9 2 6
P 2 4 7 1 0
P 4 6 8 6 7
P 4 6 8 7 0
P 3 5 3 5 2
P 3 8 7 4 8
P 1 0 5 6 7
P 3 3 1 7 6
P 4 2 5 6 6 Q 0 6 8 5 1
P 3 4 3 8 3
P 3 4 5 4 0
P 3 3 6 8 1
P 4 6 8 7 1
Q 0 0 6 0 9P 2 1 1 7 8
P 3 6 6 0 8P 3 5 9 3 7P 2 9 3 8 4
Q 0 3 4 1 6
P 4 2 0 0 3
Q 0 0 3 8 1
P 1 4 0 2 5
P 4 6 3 2 9
P 1 3 0 3 6
P 0 9 6 0 3
P 3 6 6 0 9
P 2 5 5 0 2
P 4 7 7 2 8
P 9 8 0 7 3
P 3 5 0 3 7P 2 6 9 2 7
P 9 8 0 7 4
P 1 8 2 9 2
P 0 0 7 3 8
P 3 5 0 3 6
Q 0 9 6 9 0
P 4 2 9 7 1
P 3 5 9 5 6
Q 0 8 2 8 9
P 2 5 4 7 2
P 2 7 0 3 5
P 2 3 6 6 5
P 1 4 0 9 0 P 1 6 2 1 6
P 1 9 4 2 4
P 3 6 9 0 9
P 0 7 9 8 7
P 4 8 4 2 4
P 4 3 6 3 4
P 4 7 6 3 2P 2 9 1 4 9
Q 0 1 2 0 6
P 1 1 3 6 9
P 2 8 4 8 0
P 2 1 2 4 0
P 1 3 0 2 5
P 4 6 9 7 6
P 2 2 6 7 0
P 0 2 8 5 8
Q 0 7 8 6 8
P 1 3 2 8 0
P 4 0 3 9 1
P 3 6 9 3 8
P 4 8 3 7 7
P 4 0 3 9 0
Q 0 4 7 0 7
P 2 5 2 4 7
P 0 4 3 2 3
P 2 6 3 7 4
Q 0 1 8 4 2
P 1 4 5 4 7
Q 0 4 5 7 4
P 2 9 9 9 0
Q 0 1 3 6 5
P 2 0 0 0 0
P 2 0 7 9 7
P 0 7 9 4 4
P 4 5 3 4 5 P 3 8 7 7 6
P 0 0 9 5 6
P 0 2 9 1 9
P 4 9 5 6 3
P 3 1 3 3 4 P 1 8 6 6 5
P 2 7 2 0 6P 4 6 0 6 5
Q 0 2 0 7 8
P 2 0 7 0 8
P 3 5 5 0 0
P 3 7 0 9 1
P 3 7 0 8 9
P 2 7 6 7 6
P 2 0 2 8 5
P 1 1 1 8 2
P 1 6 2 6 3
P 3 6 9 5 7
P 2 7 7 4 7
P 4 5 1 1 8
P 1 6 4 5 1
Q 0 1 6 5 7 P 1 3 5 1 6
P 1 2 6 9 5
P 1 1 5 2 2P 1 6 6 0 3
P 2 9 4 7 4
P 2 7 7 5 1
P 1 8 4 0 8
Q 0 6 5 1 8
P 2 9 4 7 6
P 3 8 0 3 9
P 1 2 2 1 7 P 1 2 2 1 9
P 4 6 3 1 7
P 4 3 8 5 0
Q 0 3 4 3 2
P 4 6 7 0 1
Q 0 8 4 6 9
P 3 1 6 5 2
P 2 0 7 1 7
P 3 7 0 9 0
P 1 5 0 0 9
P 2 0 2 8 1
P 0 6 9 5 9
P 4 5 6 4 6 P 2 9 0 9 3 P 2 6 6 7 8
P 4 1 1 1 8
P 3 0 5 1 8
P 0 7 2 5 9
P 1 9 9 7 6 P 0 9 2 3 3
P 1 1 6 5 7
P 1 8 5 2 0
P 1 6 0 5 3P 3 6 5 0 1
P 2 5 6 9 1
P 0 8 5 5 1 P 0 4 2 6 4
P 0 8 7 7 7
Q 0 4 6 9 5
P 0 8 7 7 9
P 3 1 6 4 1
P 3 1 6 4 6P 3 1 6 4 3
P 4 8 0 2 9 P 3 1 6 5 0
Q 0 5 0 0 1Q 0 1 9 5 9
P 3 1 6 4 5
P 2 1 0 3 6
P 2 9 4 7 5
P 2 3 9 7 7
P 3 1 6 6 2
P 3 6 6 4 9P 0 6 0 1 2P 4 9 3 3 1P 0 0 4 1 5P 2 2 4 6 5P 1 6 5 4 9P 0 1 2 2 9 P 3 6 9 2 4 Q 0 7 4 1 1 P 0 9 2 2 3P 0 9 4 5 1
P 4 6 4 5 5Q 0 3 0 3 0
Q 0 6 9 2 7
P 2 6 5 0 3
P 3 3 9 0 9
P 0 6 4 0 7
Q 0 1 6 4 7
P 0 9 3 2 3P 1 4 1 5 1
P 0 2 8 2 8
P 3 2 1 9 8 P 4 9 2 4 2
P 1 4 8 5 0
P 0 5 0 2 0
P 0 7 1 2 4
P 1 7 5 1 8
P 4 6 5 3 7P 4 1 5 8 6
P 3 1 9 5 6
Q 0 9 4 3 9
P 0 8 9 5 5
P 4 8 1 1 3 P 2 1 9 7 7 P 9 8 1 3 1 P 4 6 2 0 8 P 4 4 8 6 2
P 4 3 9 7 9
P 3 5 2 3 6P 3 1 5 2 2
P 1 8 6 4 2
Q 0 2 9 3 4P 1 8 2 4 9
P 4 9 5 3 0
P 3 0 4 3 8P 4 1 4 4 1
P 4 4 5 7 8
P 0 7 1 6 6
Q 0 9 7 6 5
P 3 8 7 5 6
P 3 1 2 5 1
P 0 1 3 3 8
P 3 0 5 3 0
P 4 6 7 0 2 P 4 0 9 5 4
P 4 3 8 5 1
Q 0 1 9 6 9
P 1 1 4 7 2
P 4 9 4 8 1
P 1 0 5 2 0
P 3 6 2 1 3
P 3 6 8 3 6
P 1 7 7 9 9
P 2 9 5 7 7P 4 2 4 4 9
P 2 6 6 7 7P 2 1 3 1 4
Q 0 5 1 4 6
P 4 3 7 0 8P 2 4 0 1 4
P 1 7 9 5 5
P 0 7 8 3 8P 0 6 1 0 7 P 3 2 1 9 1
P 2 3 7 7 6 Q 0 7 8 6 1P 0 6 4 5 7
P 3 2 6 0 3Q 0 0 0 1 3
P 4 5 4 3 8
P 4 3 4 3 3P 2 5 5 1 0
P 1 8 2 8 5P 0 3 0 8 9
Q 0 2 8 6 9
P 4 2 1 7 6
P 1 1 0 8 0P 1 9 3 1 8
P 0 8 0 1 7
P 4 5 6 0 4
P 2 6 2 0 7
P 4 0 5 9 9
P 2 2 0 7 1
Q 0 8 8 9 0
P 2 2 3 0 4 P 0 0 4 6 1P 4 2 0 4 2
P 2 6 6 7 0
P 1 2 1 5 5
P 8 0 5 1 7
P 3 1 5 6 2
P 3 9 8 4 2
P 3 4 4 1 0
P 2 8 2 3 5 P 4 2 0 3 4
P 1 8 8 6 0P 1 5 0 4 3
P 3 1 0 0 7
P 3 8 0 6 0
P 4 9 6 9 7
P 4 5 0 4 8
P 8 0 4 4 8
P 0 9 8 9 1
P 3 3 5 6 7
P 2 1 0 3 7
P 1 2 7 9 2P 1 7 5 0 2
P 2 5 5 1 2
P 3 4 0 2 8
P 3 3 7 6 8
P 4 5 0 0 3
P 4 9 5 2 2
P 1 8 7 7 6
P 4 3 3 3 6
P 2 9 9 2 1
P 3 5 8 8 8P 4 6 3 8 8
P 0 3 0 0 4
P 0 3 0 7 0
P 3 1 0 5 3
P 0 8 4 0 7
P 0 4 0 0 8
P 0 3 4 2 5
P 1 1 2 2 3
P 0 7 9 4 6
P 4 2 0 5 5
P 2 4 8 5 1
P 1 3 7 2 0
P 4 7 4 2 7Q 0 4 6 1 9
P 4 9 4 2 6P 1 8 1 5 8P 3 2 1 3 1
P 0 7 9 8 5P 3 6 7 4 0
P 1 0 2 5 3
P 2 4 2 6 4
P 0 5 9 7 6
P 1 1 6 2 4
P 4 0 4 6 7
P 3 4 7 3 6
P 4 0 1 4 2
P 0 6 4 7 6P 0 8 5 1 0
P 2 2 4 6 0
P 3 5 4 9 9
P 0 8 1 0 4
P 2 5 1 2 2
P 0 4 7 7 5
P 2 2 4 6 2
P 1 5 3 9 0
P 2 8 3 6 5
P 0 7 3 9 5
P 1 1 8 3 1
P 1 9 8 2 8
P 1 1 4 5 4
P 1 5 2 1 5 P 4 2 1 9 9
P 0 7 6 4 5
P 1 1 1 8 1
P 3 7 0 5 1P 2 8 5 7 0 P 2 8 5 7 1
P 3 1 6 4 7Q 0 1 2 0 5P 2 8 5 7 3
P 2 2 0 0 1 P 4 9 3 8 0P 2 2 1 8 9
P 3 9 5 2 4
Q 0 5 0 3 7 P 1 2 8 3 0
P 2 7 7 4 3
Q 0 9 8 9 1P 1 7 9 7 0 Q 0 8 4 3 5
P 3 0 7 1 4P 2 3 9 8 9
P 4 8 6 3 3
P 0 5 8 0 4
P 1 7 3 2 6
P 2 7 7 4 2
Q 0 8 7 8 8
P 2 6 2 5 7
P 2 5 4 6 4P 4 0 8 7 2
P 0 6 8 6 4
P 2 6 0 4 6
P 4 8 5 4 7
P 3 7 1 1 6
P 2 2 7 0 0
P 1 0 3 7 8
P 0 0 7 2 2
P 4 1 8 1 1
P 1 3 5 8 7
P 2 3 6 3 4
P 2 2 0 3 7
P 1 1 7 1 8P 0 5 0 3 0P 1 3 5 8 6
Q 0 8 4 3 6
P 3 2 4 8 1
P 4 6 1 9 8
P 2 4 8 8 0
P 0 8 4 2 4
P 1 5 5 5 9
P 3 1 9 7 1P 0 5 5 1 0
P 2 6 8 4 9P 3 4 8 5 4
P 4 1 2 9 8
P 0 0 7 9 7
P 1 0 8 7 0
P 0 0 5 3 8P 3 2 6 3 9
P 2 2 0 8 2
P 0 4 8 0 0
P 4 1 0 5 9
P 4 8 4 1 9
P 3 1 3 7 3
P 4 1 3 4 3
P 3 2 2 6 4
P 1 4 5 5 0
P 3 2 2 9 6
P 4 3 8 9 0
P 3 8 6 9 0
P 2 3 9 8 8
P 2 0 9 5 1
Q 0 4 5 7 5
P 3 1 6 2 3
P 1 2 8 9 4P 2 2 5 9 1
P 1 7 7 7 9
P 4 9 3 3 7
P 0 9 5 4 4
P 4 9 3 4 0
P 3 1 2 8 5
P 1 7 9 6 5P 0 0 4 5 5
P 2 1 8 9 0
P 3 6 5 9 2
P 2 1 5 5 2
Q 0 3 4 6 8P 4 0 8 2 5
P 3 9 7 4 8
P 2 5 4 0 4
P 1 5 6 8 4
P 1 0 9 3 3
P 1 7 1 2 5
P 1 6 1 7 6
P 2 3 3 5 9
P 1 6 0 4 7
P 4 3 0 2 7
P 3 8 5 5 2
P 3 4 8 2 0
P 0 7 2 0 0
P 1 3 4 9 7
P 3 0 8 8 4 P 1 8 0 7 5
P 0 7 3 3 7
P 4 4 0 2 0
P 3 6 3 0 7
P 0 7 1 3 2
Q 0 0 7 3 2
P 3 3 8 1 7
P 2 6 8 0 8
P 2 3 4 2 6Q 0 1 0 0 2
P 2 6 7 6 4
P 0 3 5 7 9P 2 1 3 7 6
P 3 6 3 5 1
Q 0 4 5 1 9
Q 1 0 0 5 7
P 1 2 8 7 0P 1 5 1 8 3
P 1 0 9 5 0P 1 4 7 4 9
P 1 4 6 7 7
P 4 8 6 0 1
P 1 7 2 4 7 P 4 9 0 0 3P 2 5 6 2 1
P 3 3 9 4 5
P 3 7 8 9 8P 2 0 7 2 2
Q 0 6 4 4 3
P 1 5 5 4 1
P 3 9 8 7 5
P 2 1 2 7 4
P 4 0 7 5 0
P 3 4 8 2 1
P 3 6 1 7 8
P 2 4 6 5 1P 4 8 8 2 5P 3 3 3 6 3
P 4 9 0 8 6
P 1 4 1 2 1P 3 4 0 5 7
P 0 9 0 0 1
P 2 1 4 6 1
P 4 5 0 4 5Q 0 3 1 5 7
P 4 4 3 6 6P 2 9 3 8 5 P 2 8 0 3 7
P 4 9 4 0 4
P 9 8 0 5 6
P 2 7 9 7 3P 3 3 4 2 9
P 0 7 5 7 2P 0 5 3 4 2P 3 3 9 0 5 P 4 4 8 5 4
P 2 7 4 2 2
P 1 4 6 2 5
P 0 2 6 3 6P 0 0 3 8 2
Q 0 7 8 0 1
Q 0 1 9 9 2 P 2 3 6 5 8
P 2 3 9 6 5 P 1 9 2 1 7 P 3 6 7 8 8
P 4 6 2 3 6
P 0 9 1 5 2
P 3 7 5 8 9
P 2 8 3 2 9P 0 7 6 6 8Q 0 5 8 8 5P 2 2 9 3 6
P 1 2 9 0 5P 0 6 0 2 4
P 0 7 1 4 7 P 2 6 5 7 0P 0 3 3 6 2 P 2 8 0 6 2
P 0 6 1 2 5
P 1 7 9 8 9 P 0 6 6 7 0P 0 6 2 0 2 P 2 1 2 4 9P 4 4 6 0 4
P 1 7 4 9 0Q 0 0 6 8 9Q 0 6 0 3 1P 1 6 6 6 5Q 0 6 9 0 8
P 0 0 2 5 9
P 4 6 3 1 2 P 1 6 4 6 6 P 1 5 3 2 0 P 4 7 2 6 7 P 1 3 6 3 5 P 4 0 3 1 9 P 2 6 0 0 7P 4 1 5 6 3
P 4 7 8 7 7
P 3 4 4 2 5P 3 3 6 9 6P 2 4 0 8 1
P 2 4 9 0 6 P 0 3 0 7 4
Q 0 0 5 1 8
P 1 3 7 0 5 Q 0 5 7 4 9 Q 0 0 2 8 6
P 2 7 0 9 2 P 4 4 3 3 0
P 0 6 7 9 8Q 0 0 0 5 6
P 2 4 5 0 1 P 1 5 7 5 0
P 2 7 6 6 8 P 4 3 7 7 5 P 2 9 7 3 3P 4 7 0 6 4P 0 9 8 3 5 P 1 2 1 5 9
P 2 3 1 0 0 P 1 7 9 2 1 P 2 7 0 0 1 P 3 3 5 6 2 P 4 9 1 1 6 P 4 9 1 1 7 P 4 4 9 8 4 P 3 2 0 7 0P 4 0 3 4 0P 1 7 7 4 3
P 4 6 0 2 5 P 4 6 0 2 6 P 3 9 6 5 6 P 3 3 7 6 7 P 2 4 5 2 9 P 0 4 1 7 7 P 0 9 6 0 7 P 4 1 5 1 0
P 4 8 0 3 2 Q 0 0 3 3 5 P 1 5 9 1 0 P 0 8 8 1 9 P 4 9 2 5 3 P 4 0 2 7 5P 2 5 7 8 5
P 1 6 5 6 9P 4 2 9 0 5P 4 2 9 0 9P 3 7 0 8 1
P 2 5 9 8 6P 3 6 7 2 1Q 0 5 1 1 1P 4 2 1 9 1Q 0 0 9 9 1P 0 7 3 0 5
P 3 4 5 4 6 P 3 2 8 4 2 P 2 6 4 6 9 P 4 5 0 4 2 P 1 7 5 2 0 P 1 4 0 4 1P 1 1 6 2 3 P 4 1 1 1 5
Q 0 7 5 7 4Q 0 7 5 7 1P 1 3 2 8 8Q 0 1 0 1 5P 4 2 8 7 1P 2 9 0 6 2 P 4 6 8 1 0 P 4 4 3 9 6P 4 9 0 5 7
P 2 7 5 9 4 P 3 9 1 4 8 P 2 0 1 0 3 P 4 3 8 3 8 P 1 0 6 4 1 P 1 5 3 6 8 P 1 9 0 9 7
P 1 6 0 2 7 P 4 8 5 5 6 P 3 9 8 2 2 P 4 1 3 9 0 P 0 4 0 4 6 P 2 6 3 6 7 P 4 7 2 3 8 P 4 8 5 0 6
Q 0 7 2 1 1 P 3 5 0 5 2 P 3 5 0 5 3 P 2 2 4 4 9 P 2 6 0 2 2 P 4 9 2 5 9 P 4 9 2 6 0P 0 8 7 2 2P 4 9 6 9 8
P 3 5 3 3 4P 2 2 2 8 4P 2 6 9 8 4Q 0 1 2 6 3P 1 9 7 8 2 Q 1 0 0 9 4 P 3 5 3 5 4P 2 2 2 8 5P 2 6 4 2 0P 0 4 3 9 4Q 0 1 5 8 1P 2 3 2 2 8Q 0 9 7 6 8 P 3 2 7 4 7
P 4 3 4 6 8P 2 8 6 6 8P 1 5 4 0 7P 2 9 1 7 6P 1 8 6 2 5
P 0 4 8 9 0
P 3 5 5 3 8
P 2 1 8 1 0
P 1 8 4 6 6
P 2 3 9 5 0
P 2 4 4 3 0
P 4 9 4 1 6
P 0 7 1 0 2
P 3 9 7 7 3
P 4 7 8 5 3
P 1 4 9 2 4
P 4 2 7 9 7
P 1 7 4 3 1
P 3 1 4 3 1
P 3 6 3 3 6P 3 6 3 3 7
Q 0 7 0 2 0
P 3 6 0 3 1
P 4 0 8 9 0
P 3 4 1 1 8
Q 0 1 1 3 0
P 1 9 4 1 6P 3 2 2 8 2
P 3 9 7 6 6
P 3 6 3 1 9
P 2 7 9 8 0
P 2 6 6 5 4
P 3 6 3 0 0
P 1 5 4 7 7
P 1 9 9 0 9
P 0 7 6 1 7
P 3 0 6 7 3P 3 6 3 2 0
P 4 4 3 5 6P 1 5 6 2 9
P 4 0 5 8 4
P 4 0 4 3 8P 1 1 7 6 8
P 1 6 7 1 7
P 4 0 9 5 9P 4 8 9 6 6P 3 0 3 0 5P 1 7 8 1 7 P 0 4 0 7 0
P 3 0 3 5 2
P 2 1 0 3 3
P 1 5 5 2 0
P 4 1 0 0 6
P 2 5 1 9 2
P 1 2 1 4 6P 0 8 2 4 7 P 2 1 0 8 0
P 3 3 8 6 2P 0 7 8 2 5Q 0 9 6 8 2 P 2 0 9 8 5
P 2 3 6 3 8 P 4 6 7 8 6 P 3 3 8 3 2P 0 6 4 9 0
P 0 2 6 3 5P 1 0 1 7 0P 4 7 4 5 6P 1 0 1 7 1
P 4 7 5 5 1P 4 2 9 6 3
P 2 5 6 0 5Q 0 2 1 4 0
P 3 7 1 1 8 P 1 1 9 7 6P 3 8 6 0 8P 0 1 7 3 1 P 2 2 5 4 9P 2 7 5 1 2 P 3 0 4 3 3 P 2 7 2 5 7
P 1 3 0 5 3 P 1 9 1 1 3 P 4 4 5 2 4
P 0 4 8 4 4P 2 3 8 0 0
P 0 0 8 9 4
P 2 6 0 1 0
P 3 9 7 6 8
P 4 5 1 1 2 P 0 4 2 5 6P 4 9 3 1 2P 1 2 6 8 7P 4 8 5 3 5P 1 1 1 5 7P 3 1 3 5 0P 2 5 2 3 5P 2 5 9 1 6
P 2 4 5 5 5 P 2 0 1 1 6
P 3 5 3 1 2P 4 0 9 1 5
P 4 6 6 5 5
P 2 5 4 6 8
P 3 3 8 6 1P 3 2 2 0 6
P 0 5 0 3 4P 1 6 4 5 3
P 1 8 1 3 3 P 1 4 0 7 8 P 1 8 1 3 9 P 3 3 3 2 9P 4 0 1 2 6P 0 6 7 2 5
P 4 8 7 5 5P 3 4 8 9 4P 2 9 9 3 0
P 2 1 8 9 3
P 1 2 8 4 3 P 3 0 1 9 5P 2 9 9 3 1P 4 8 7 4 7
P 1 8 7 5 4P 3 8 7 5 4P 4 7 6 5 7Q 0 8 6 8 4P 1 1 0 9 5 P 1 1 8 3 5
P 0 6 5 8 6
Q 0 2 3 2 6
P 1 9 6 4 2
P 2 9 1 9 1P 3 9 9 6 5
P 4 2 7 5 6 P 0 9 9 0 1 P 1 8 0 7 6 P 2 7 6 8 8Q 0 1 9 5 3 P 2 1 4 1 3P 1 4 5 5 3 P 3 9 5 9 9 P 2 1 7 7 5 P 1 7 9 5 9P 0 7 8 4 2P 3 3 2 9 1P 1 1 6 8 0P 4 6 5 8 6 P 4 4 4 6 2
P 2 8 0 6 3 P 1 0 4 8 6 P 1 7 2 2 4 Q 9 9 1 3 2 P 1 2 8 0 6 P 1 1 4 6 5 P 1 8 1 4 8 P 0 2 7 8 1
P 2 5 9 4 2P 4 5 0 3 5P 3 7 1 1 7P 4 1 0 8 5P 0 7 2 0 6P 2 7 9 1 9P 0 4 9 9 6P 2 0 0 9 3 P 2 9 9 6 1P 3 5 7 2 1
P 4 1 5 8 5P 2 3 7 0 0
P 1 0 2 1 2P 2 3 5 9 9P 2 3 6 0 3
P 2 5 4 4 5P 2 5 4 4 6
P 3 5 5 2 0
P 2 3 2 7 9
P 1 0 6 1 5
P 1 4 2 6 3
Q 0 2 1 8 4 Q 0 2 1 8 5 P 3 3 2 4 0 P 3 9 3 9 6 P 1 3 1 5 6 Q 0 3 0 3 1 P 1 3 6 4 9
P 2 1 0 4 6Q 0 1 4 7 8
P 2 8 0 7 0P 3 2 2 1 6
P 3 3 8 0 0P 0 9 5 6 0
P 0 4 0 4 9 P 1 6 9 1 3 P 1 0 4 9 8
P 1 6 5 5 1
P 3 5 1 5 5 P 3 3 8 6 3 P 4 0 1 1 2
P 2 0 2 3 2 P 4 9 3 7 3 P 3 7 9 7 1 P 3 3 6 5 8 P 0 6 1 6 3 P 0 6 1 6 2 P 2 8 7 0 2 P 2 8 7 0 4P 1 9 9 4 0P 3 7 9 7 0
P 2 2 6 7 5Q 0 9 1 4 5Q 0 7 7 4 4P 0 7 6 0 6P 4 2 3 2 6P 2 4 2 2 0 Q 0 3 8 4 5P 4 3 0 6 1
P 3 5 4 9 4P 0 8 3 2 0P 0 7 7 3 8P 0 6 1 5 5
P 0 8 4 5 6Q 0 9 7 5 0
P 0 0 4 5 2P 2 5 4 5 1 P 0 5 1 3 7P 3 7 4 2 6 P 2 6 7 1 1
P 3 6 6 7 4P 3 6 6 7 3P 2 1 8 0 3P 1 6 0 9 2P 3 0 6 1 3P 4 1 0 0 3P 2 9 9 8 7P 2 9 9 8 9
Q 0 6 7 9 3P 1 0 2 3 1P 2 2 7 0 9P 1 3 5 1 3 P 1 3 9 8 0 P 0 8 3 1 3P 2 7 6 7 9Q 0 1 0 5 6
P 3 4 3 3 5
P 0 0 5 4 9
P 0 7 1 1 7
Q 0 1 0 8 5
P 4 3 7 5 1
P 2 9 7 3 2 P 3 2 9 0 8
P 3 1 4 8 3
P 1 0 5 0 3 P 4 9 0 4 1 P 2 8 5 2 7 P 4 3 8 0 4
P 2 5 9 7 6 P 2 5 9 7 7P 1 3 2 8 6
P 2 3 4 0 2P 4 2 6 3 2
P 3 1 3 2 5P 2 4 4 2 5
P 4 2 6 9 7
P 3 7 6 7 6
P 0 0 5 8 8
P 1 8 2 7 8
P 3 0 6 7 2
P 1 4 0 4 2 P 2 4 6 5 2
P 3 8 4 9 3 Q 0 5 8 8 0
P 0 0 5 8 9
P 2 3 3 8 6 P 4 6 0 1 8
P 4 1 5 6 2
P 4 3 4 4 9 P 2 5 2 5 0
P 1 1 1 7 2 P 1 3 2 1 5P 1 4 0 1 7
P 0 8 7 0 8Q 0 5 0 9 4
P 4 3 8 9 3P 2 3 2 2 9
P 4 8 8 0 5P 4 3 7 9 4P 1 4 6 5 5 P 3 6 3 8 6P 4 3 2 1 9 P 0 9 6 8 1 P 1 4 6 6 2 P 4 0 2 7 9 P 2 5 8 5 7 P 0 9 3 1 7 P 3 1 8 3 8 P 3 0 3 4 7
P 2 6 6 3 9P 4 7 1 9 1P 1 6 5 1 8P 1 3 8 3 7P 0 3 7 1 0 P 4 0 1 1 1Q 0 5 2 5 9 P 0 8 3 8 6P 1 7 0 5 3
P 0 4 1 4 4 P 1 8 1 1 5P 0 4 6 2 6P 1 3 6 0 8P 4 1 7 1 3 Q 0 5 7 9 3P 4 2 6 6 2 P 1 8 5 6 9P 4 2 6 7 4 Q 0 2 9 1 7
P 0 3 5 1 9P 1 6 7 2 9P 0 3 2 2 6P 4 3 2 6 2P 4 4 0 4 2P 3 5 7 7 7P 3 3 8 5 7P 2 9 8 1 6P 3 6 2 2 7 P 2 0 9 9 1 P 1 9 3 4 8P 3 3 8 3 9
P 4 0 7 9 1Q 0 0 4 9 5
Q 0 4 7 4 7
Q 0 4 7 7 7
P 2 6 2 6 2
P 2 7 0 9 0P 4 5 2 7 4
P 2 0 3 9 6
P 2 6 8 4 6
P 0 4 5 4 0P 1 5 5 8 1
P 2 9 9 1 3P 0 8 7 3 9
P 3 4 8 5 2P 4 0 0 1 0 P 1 5 5 8 2
P 2 0 4 5 9
P 0 6 2 6 2
P 3 3 5 1 0
P 2 4 5 8 8
Q 0 1 2 0 7
P 8 0 3 1 3
P 4 6 2 3 8
P 4 7 6 7 2
P 2 0 0 2 8P 2 2 1 3 8
P 0 3 0 4 2P 1 4 1 1 0
P 4 0 5 2 7
P 3 2 6 6 0
P 1 3 5 4 7
P 0 6 5 6 5
P 4 5 0 7 5P 4 3 7 6 3
Q 0 1 8 3 6
P 4 8 4 0 9
P 0 7 0 0 4
P 1 4 0 4 3
P 4 5 6 3 8
P 2 1 3 4 7
P 1 7 6 5 8
P 1 3 8 0 6
Q 0 2 3 4 3P 2 2 0 0 2
P 2 2 3 1 6
P 3 7 0 8 8 P 3 3 6 1 3P 4 8 9 1 5P 2 4 8 8 4
P 3 4 8 5 5
P 2 9 8 0 1
P 0 5 9 8 2
P 1 5 5 6 4
P 1 4 6 1 4 P 4 5 9 7 5
P 3 6 6 4 2
P 0 3 1 0 5
Q 0 9 6 8 3
P 2 7 2 3 5
P 3 6 7 6 3
P 3 8 1 4 4
P 4 0 1 1 8P 8 0 1 5 1
P 2 9 0 7 2
P 2 3 2 6 5P 4 8 9 7 4P 4 6 9 6 0P 2 1 4 6 3
P 4 7 9 0 1
P 3 9 1 4 3P 4 3 6 5 7
P 4 8 4 8 3
Q 0 8 2 0 9
P 0 7 6 2 1
P 2 6 0 4 5
P 4 8 4 5 2
P 3 1 3 8 9P 4 8 4 5 9
Q 0 6 1 8 0
P 1 4 7 4 7
P 2 9 0 7 4P 4 3 3 7 8
P 2 8 8 2 8
P 1 6 4 7 3
P 4 6 0 2 3
P 2 4 9 3 9
P 0 5 0 7 8P 0 6 7 2 4P 4 2 8 4 2
Q 0 3 5 6 0
P 4 7 7 9 9
P 2 1 4 5 0
P 2 7 5 8 0
P 2 7 3 5 2
Q 0 2 9 4 2P 4 2 3 4 7
P 4 3 2 5 3
P 2 3 3 6 2
P 1 0 7 2 0
P 4 7 8 0 3
P 4 3 1 4 2P 1 4 1 2 6
P 2 1 0 8 4
P 2 1 0 3 9
P 3 5 3 5 0
P 4 3 5 0 5
P 4 7 3 1 9
P 3 7 2 8 5
Q 0 7 8 6 6
P 3 7 6 5 0
P 1 5 7 0 5
P 3 1 9 4 8
P 4 4 0 9 2
P 4 1 8 4 2
P 3 8 8 2 5
P 0 4 0 0 1
P 3 2 3 1 1
P 2 2 3 3 2
P 2 3 1 6 3
P 3 5 3 8 3P 3 2 2 5 0
P 4 9 6 5 0P 3 4 9 7 9
P 3 2 9 4 0
P 1 1 6 1 3
P 2 8 0 8 8
P 2 0 6 3 8
P 4 3 1 1 5
P 2 5 1 0 6
P 3 4 9 8 0
P 3 2 2 4 0
P 3 0 5 5 7
P 1 3 9 4 5
P 3 6 1 7 6
P 1 9 3 9 8
P 1 5 4 0 9
P 3 0 5 4 6
P 3 5 4 0 9
P 3 8 8 6 7
P 2 0 9 8 9
P 2 5 4 7 3
P 3 7 9 7 2
Q 0 1 7 1 7P 3 5 4 0 8
P 3 2 5 1 2
P 2 3 8 0 1
P 2 2 2 9 7
P 1 2 8 0 5
P 3 0 3 7 2
P 4 8 7 4 8
P 4 7 8 0 4
P 1 0 4 7 5
P 0 4 9 5 5
P 0 6 5 6 6
P 1 9 5 5 9
P 2 6 8 0 6
P 0 3 3 4 5
P 0 3 3 3 6
P 2 9 7 1 9
P 0 4 9 5 6
P 2 8 6 2 1
P 1 0 3 9 4
Q 0 2 0 9 9
P 4 8 2 7 9
P 3 4 4 5 5 P 0 4 0 1 3
P 3 6 7 5 3
P 3 6 7 4 9
P 3 6 7 5 4
Q 0 7 8 7 5
P 3 3 2 9 2
P 3 5 0 5 6
P 4 4 5 2 0
P 4 9 5 3 4
P 4 9 5 3 3
P 4 8 2 7 1
P 4 9 5 1 8
P 3 4 0 5 5
P 4 6 7 3 7P 3 2 4 8 2
Q 0 3 5 6 9
P 3 0 5 5 9
Q 0 5 3 9 4
P 3 2 3 0 6
P 1 0 9 0 9
P 4 1 2 3 1
P 2 2 3 2 2
P 4 1 1 4 3
P 4 7 7 5 1P 3 0 5 4 9
P 3 4 9 7 5P 2 8 3 3 6
P 2 8 6 8 0
P 3 2 2 3 6
P 2 0 3 4 6
P 0 8 1 7 3
P 3 5 3 7 2
P 3 0 0 9 8
P 3 3 5 3 3
P 4 1 1 4 4
P 4 7 7 4 8
P 3 5 3 7 7
Q 0 4 5 7 3
P 0 5 3 6 3P 3 5 3 7 1
P 4 6 0 9 0
P 3 5 3 4 3
P 1 7 1 2 4
P 3 4 8 1 3
P 2 2 3 2 1
P 4 9 5 7 8
P 0 8 1 7 2
P 0 8 9 1 2
P 4 7 8 9 8
P 2 5 1 1 5
P 1 8 9 0 1
P 1 1 2 2 9
P 2 1 9 1 7
P 3 2 2 1 1
P 1 8 8 2 5
P 2 9 2 7 6
P 2 0 3 0 9P 4 2 2 8 9
P 3 0 6 8 0
P 2 8 6 4 6
P 3 0 8 7 4
P 3 4 9 6 9
P 2 5 9 6 2
P 3 0 8 7 2
P 0 1 4 5 2
P 3 0 9 3 6
P 3 2 7 4 5
P 2 5 9 3 0
P 3 5 4 0 7
P 3 0 9 3 8
P 3 1 3 9 1
P 3 0 9 3 7
P 3 1 1 3 3
P 0 7 7 0 0
P 4 4 7 6 8
P 3 1 1 3 4
P 0 4 2 7 4
P 1 0 6 0 8
P 4 2 2 9 0
P 4 3 1 4 1
P 0 9 4 9 9
P 0 3 5 6 6
P 1 3 7 9 2
P 0 8 1 9 9
P 0 8 6 8 9
P 4 3 8 2 3
P 0 6 8 6 8
P 1 8 2 5 4
P 4 9 6 0 8
P 3 5 4 4 7
P 3 5 7 7 8P 3 7 2 3 1
P 3 9 0 6 1
P 1 9 7 6 5
P 0 7 7 0 3
P 3 9 8 9 8
P 4 6 9 2 5
P 1 2 0 8 1
Q 0 3 1 8 1
P 3 5 4 4 6
Q 0 0 9 9 3
P 4 6 4 5 6
P 3 8 9 7 2
P 4 3 8 4 7
P 3 7 1 9 8
P 2 0 0 7 3
P 2 4 8 1 4
P 3 4 9 4 9
P 3 8 0 9 2
P 2 2 4 5 8Q 0 1 0 2 0P 3 7 8 9 6P 1 5 1 5 5Q 0 4 9 1 6P 0 9 7 7 6P 1 5 4 0 3
P 4 4 8 3 6
P 0 9 9 5 0
P 1 9 7 6 3
P 1 3 3 8 3
P 3 1 1 7 1
P 1 1 8 3 4
P 2 6 9 7 5
P 2 5 3 8 4P 3 7 0 3 2
P 2 9 3 5 3
P 1 5 0 0 1
P 0 8 7 7 8
P 4 0 8 0 1
P 3 3 0 6 8
P 4 7 8 4 6
P 3 2 2 5 7
P 0 4 3 8 6
P 1 1 6 2 1
Q 0 6 6 6 0 Q 0 0 9 6 6 Q 0 3 2 4 4 P 1 8 2 9 4 P 3 7 5 7 3 P 3 4 9 1 3P 2 1 8 7 6
P 2 9 9 5 1 P 1 3 6 0 5 P 2 7 2 1 4 P 4 8 8 9 2 P 1 1 7 0 1P 3 2 0 8 6
P 0 9 8 3 2P 4 4 7 9 5 P 3 4 6 5 0
P 3 8 2 8 5 P 3 7 1 2 7
P 2 3 2 4 8
P 3 2 2 1 5
Q 0 8 6 4 2
P 3 1 3 0 1
P 4 6 5 3 8
P 0 2 6 6 2
P 1 2 2 5 6
Q 0 5 2 3 7
P 1 8 6 2 6
P 2 0 8 2 8
P 2 9 0 2 9 P 4 2 8 8 3
P 3 3 6 5 5
P 3 3 7 5 2
P 3 9 5 2 9
P 4 5 5 8 4
P 2 0 5 3 3
P 4 1 9 7 8
Q 0 9 9 2 3
Q 1 0 1 1 3Q 0 3 4 6 0
P 2 0 8 7 4
P 4 8 0 4 4P 2 9 3 3 6
P 2 0 1 4 8
P 0 4 2 8 2
P 3 8 0 2 5
P 1 4 8 5 3Q 0 6 8 2 8
P 1 7 3 6 5P 0 6 4 8 5P 0 6 5 3 1Q 0 5 7 4 0 P 2 1 0 4 5P 1 3 2 9 2
P 2 0 4 0 9
P 2 0 5 3 4 P 2 6 6 3 7 P 4 7 9 1 3 P 8 0 0 5 9 P 4 5 6 1 0 P 0 5 3 2 8 Q 0 8 0 9 9 P 2 1 0 4 4
P 3 7 5 5 1
P 1 4 9 0 7
P 1 5 5 6 7 P 2 5 7 6 5P 1 6 1 6 7
P 3 3 8 2 7 P 1 0 8 5 7Q 0 9 8 2 8 P 4 1 5 6 9
P 2 9 4 5 3P 3 6 6 1 9
P 4 5 0 7 7P 2 6 1 8 5
P 0 7 0 5 6P 0 9 7 5 8P 1 6 4 2 2P 0 4 1 4 3P 4 5 8 3 7P 2 3 5 4 9 P 1 5 6 9 8P 1 0 4 8 1P 0 7 9 8 4
P 2 5 3 8 3 P 2 8 3 2 4 P 4 1 1 5 8 P 3 3 8 7 9 P 3 4 5 3 1P 2 9 8 5 9P 4 3 0 0 2
P 1 5 3 0 9P 0 9 8 8 9P 1 4 3 8 1Q 0 0 2 6 9P 2 6 6 4 7P 3 0 7 0 6P 3 4 7 5 4 P 1 2 8 5 2P 2 0 6 4 6P 2 9 1 3 0
P 2 7 6 9 9
P 4 8 5 6 2P 4 6 0 6 1
P 1 8 6 0 9P 0 3 5 3 8
P 2 9 9 1 5
Q 0 0 2 7 4
P 4 3 3 0 9P 4 0 0 1 2P 4 3 1 1 9P 1 3 6 0 9P 2 1 5 5 6 P 1 0 1 2 4 P 1 0 6 8 6P 0 8 4 8 7P 2 5 1 0 5P 3 2 7 3 6 P 4 3 2 5 2
P 4 3 3 0 7 P 1 9 7 6 1 P 1 4 1 4 7 P 4 5 6 0 8 Q 0 9 0 3 7 Q 0 6 2 2 2P 3 5 8 5 6 Q 0 2 4 4 5P 1 6 9 6 7 P 2 1 1 5 1
P 4 6 0 6 0P 2 4 9 1 8
Q 0 3 0 6 1P 1 0 8 6 6P 8 0 2 9 9 P 1 2 5 5 9
P 4 2 0 5 9
P 1 3 3 9 3
P 1 5 6 3 9
Q 0 1 0 6 0
P 3 0 5 9 7
P 2 7 6 3 3P 1 4 6 9 8
P 0 7 8 7 1
P 4 9 1 0 8
P 2 7 5 0 7
P 2 9 4 6 5
P 4 3 8 5 2
P 0 4 0 0 9
P 2 4 2 7 1
P 2 3 8 4 9
P 2 3 0 5 2
P 1 9 2 4 4
P 4 0 3 0 7
P 2 8 9 6 8
P 2 2 5 5 7
P 1 7 7 9 3 P 1 3 3 7 2 P 2 4 4 4 3 P 0 8 1 9 5 P 1 0 8 5 2 P 4 3 3 6 0 P 4 3 3 6 6P 1 9 2 7 5 P 4 2 3 9 5
P 3 5 9 7 2
P 3 5 1 6 9
P 3 1 3 9 3
P 0 4 8 5 1
P 2 3 3 8 3
P 3 2 6 0 0
P 1 3 7 9 7
P 1 6 9 1 6P 0 6 8 0 4
P 0 4 7 7 6
P 4 5 2 9 4
P 0 4 3 4 7
Q 1 0 1 0 6 P 3 4 2 4 7
P 3 0 6 2 5P 3 1 3 1 9
P 3 7 5 2 8
P 2 2 1 4 1 P 4 4 2 8 6 P 3 2 5 0 6 P 1 5 1 5 1 Q 0 6 6 0 0 P 4 0 2 4 2P 0 1 3 7 4P 0 7 2 2 4P 3 3 6 1 0
P 2 1 0 6 9P 2 1 0 6 6P 3 3 8 5 4P 2 4 7 6 6P 2 4 7 6 1P 0 9 5 0 6 P 3 3 8 5 5
P 3 5 5 6 5P 3 8 7 5 5
P 4 4 8 4 3
P 3 0 2 3 6
P 3 1 8 1 2P 2 3 0 0 3P 4 6 8 0 1
P 1 1 1 3 1P 1 4 6 0 5
P 4 9 3 8 9
P 3 0 5 2 8 P 2 0 1 7 4
P 3 7 7 6 5
P 0 8 6 8 0
P 2 2 6 4 8 P 4 4 0 7 4
P 2 4 5 9 8 P 1 5 5 0 9 P 1 1 0 5 2
P 3 4 0 8 8
P 0 4 2 2 0P 0 1 8 6 0
P 0 7 5 6 7
P 4 9 6 4 3
P 2 0 4 0 6
P 4 7 4 1 8
P 0 5 3 5 6
P 1 7 7 9 7P 0 9 7 7 9
Q 0 8 1 0 0
P 0 7 1 6 9
P 0 6 4 7 5
P 3 8 1 7 4
P 2 9 1 7 5
P 1 5 3 1 9
P 3 0 7 6 4
P 1 3 4 9 6
P 1 9 2 5 4
P 3 1 3 9 6
P 4 8 9 3 7
Q 0 0 7 6 1
P 3 7 5 1 2
P 0 5 3 6 9 P 2 3 1 7 6
P 0 4 3 2 9
P 0 3 6 0 2
P 2 8 5 8 3
Q 0 8 1 0 3
P 2 4 1 9 3
Q 0 0 2 5 9
P 3 3 7 2 5P 4 5 2 7 7
P 0 9 1 4 4
P 2 1 0 9 7
P 1 1 3 4 9
P 0 5 9 4 5
P 2 6 1 7 7
P 3 3 8 5 9
P 3 5 8 9 0
P 3 6 4 9 2
P 2 6 2 3 7
P 0 5 3 5 9
P 4 5 1 7 5
P 0 9 7 8 3P 2 8 9 1 7
Q 0 7 7 6 2
P 3 3 8 5 1
P 1 0 2 1 1P 2 8 9 1 2
P 2 8 9 1 6
P 0 3 1 7 3
P 1 3 3 7 4
P 2 8 5 4 8
P 2 9 5 9 0
P 4 9 1 3 7
P 3 9 7 4 5P 4 7 8 1 2
P 3 1 3 1 4
P 2 0 2 6 5
Q 0 0 6 8 0
P 4 3 3 4 5
P 2 0 2 6 7
P 3 2 4 9 0
P 4 3 1 2 0
Q 0 1 8 6 0
Q 0 4 9 9 6
P 1 3 5 9 4
P 2 1 9 5 2
P 0 3 2 7 4
P 1 0 1 8 1
P 3 3 2 1 6
Q 0 9 4 3 5
P 4 2 0 8 6
P 0 7 3 1 3
P 1 5 2 4 2
Q 1 0 0 5 6
Q 0 9 4 9 9
P 1 1 7 3 0
P 1 3 5 9 2
P 0 7 5 2 4
P 3 4 8 9 1
P 2 7 0 3 7
Q 0 3 0 4 3
P 3 1 1 6 9
P 2 3 4 3 5
P 4 1 2 7 9
Q 0 6 5 4 8
P 4 8 5 1 0 P 2 7 7 0 4P 4 5 9 8 4
P 1 2 5 7 5
P 3 5 4 1 6
P 1 6 4 7 7P 3 2 4 9 2
P 0 5 6 6 1
P 1 4 1 0 5
Q 9 9 3 2 3
P 1 8 4 3 1
P 0 9 3 8 6
P 4 9 1 8 5
P 1 8 2 6 5P 3 6 0 0 5
Q 0 4 8 9 9
P 3 2 3 5 8P 4 7 8 1 1
P 3 5 4 1 7
P 3 5 5 7 9
P 0 8 7 9 9 P 3 5 4 1 5
P 3 1 3 6 7
P 3 1 3 6 0
P 0 9 6 2 9
P 2 7 6 0 9
P 3 7 8 0 6P 1 0 1 8 0
P 4 9 3 3 5
P 0 9 0 1 5
P 3 1 8 9 9
P 1 0 6 2 8
P 3 4 6 9 4P 2 1 0 0 1
P 3 9 9 8 4P 3 1 2 4 9
P 4 0 5 9 2
Q 0 1 2 2 6
P 0 3 4 3 4
P 3 6 1 9 7
Q 0 8 7 2 7P 2 0 2 6 3P 2 0 7 1 9
P 4 9 6 4 0
P 4 8 2 4 1
P 3 7 9 3 5
P 1 7 2 7 8
P 2 2 3 1 7
P 1 0 2 6 7P 3 1 3 6 4
P 2 9 8 2 5
Q 0 0 4 6 6
P 1 6 1 4 3P 3 7 9 3 8
P 4 0 4 2 6
P 4 2 5 8 7
P 1 4 8 5 8
P 3 7 2 7 5
Q 0 1 6 3 0
P 3 1 3 6 2
P 4 2 5 7 1
P 3 1 2 5 8
P 0 2 8 3 2
P 1 3 5 9 0
P 4 6 6 0 8P 1 8 2 6 4
P 4 3 6 9 9Q 0 5 4 6 6
P 1 4 6 5 2
P 4 6 3 2 0
P 4 0 7 6 4
P 2 2 0 0 9P 4 5 5 7 7
P 0 5 8 2 4
P 1 7 9 1 9
Q 0 3 9 7 4
P 2 3 8 1 2
P 0 9 0 7 9
P 4 3 6 9 8P 0 7 5 4 8
P 0 9 0 2 6
P 4 6 6 0 4
P 4 0 3 1 7
P 0 9 6 3 2P 2 3 4 5 9P 0 2 8 3 6
P 2 4 0 6 1
P 2 1 0 0 0
P 4 6 7 3 5
P 1 2 8 4 5
P 1 9 5 2 4
P 4 2 5 2 2
P 1 0 5 6 9
P 3 4 0 9 2
P 3 5 7 4 8 Q 0 2 4 4 0P 1 9 7 0 6
P 3 6 0 0 6
P 0 1 1 9 3
P 0 1 0 2 9
P 0 5 9 9 7
P 4 4 4 1 0
P 9 8 1 3 7
P 1 4 2 0 5
P 0 2 7 0 3
P 2 8 3 6 6
P 3 8 3 8 0P 4 7 8 4 7
Q 0 6 4 6 1
P 3 8 0 4 1P 4 9 6 4 9
P 3 9 9 4 0
P 2 4 5 0 7
P 0 6 6 3 4
P 4 2 6 9 4
P 4 6 9 3 4
P 0 5 1 2 9
P 4 1 8 2 3
P 3 4 6 8 9
P 2 0 7 9 3
P 3 2 8 9 2
P 2 5 8 0 8
P 3 9 5 4 6
P 1 5 4 2 4
P 3 9 6 8 7
P 2 1 6 9 3
P 8 0 2 0 6
P 0 2 4 8 2
P 8 0 2 0 5
P 2 2 7 3 5P 1 6 9 8 9
P 3 2 2 4 3
P 4 1 3 8 1P 2 0 4 4 8
P 3 2 2 4 2
Q 0 4 9 1 2
P 1 6 2 5 8
P 1 3 2 3 8
P 2 2 0 5 9
P 4 1 2 4 1
P 0 7 1 9 9
P 2 7 4 1 4
P 3 5 8 4 5
P 4 4 5 2 6
P 2 6 8 0 2
Q 0 9 7 2 7
P 2 4 7 8 2P 3 3 9 0 6
P 4 2 3 0 5
P 2 7 6 5 8
P 1 0 6 4 3 P 1 5 9 2 5
P 3 3 4 8 4
P 1 2 0 8 0
P 2 9 4 0 0
P 3 1 8 9 4
P 3 5 2 4 7
P 3 5 2 4 8
P 0 6 6 8 1
P 2 6 6 4 5
P 2 1 7 5 8Q 0 1 1 4 9
P 0 7 8 7 6
P 1 4 2 8 2P 2 0 7 0 1
P 1 4 9 9 6
P 2 1 1 8 0
P 4 4 9 5 3
P 1 5 9 8 9
P 2 5 5 2 4
P 0 2 4 5 8
P 3 4 3 4 0
P 2 8 4 8 1
P 0 5 5 5 5
P 1 8 5 4 6P 2 0 9 0 8
P 2 1 7 5 7
P 2 4 0 6 3P 4 2 8 9 0
P 3 5 2 4 6
P 3 1 6 9 5
P 3 2 3 2 8
P 2 3 2 9 8
P 0 4 5 8 4
Q 0 5 6 5 5
P 4 8 6 1 5
P 2 3 2 1 9 P 2 6 9 9 3
P 1 3 1 8 5
P 3 5 7 6 1
Q 0 9 8 9 8
P 4 7 8 0 9 P 1 7 7 8 9
P 0 5 1 2 6
Q 0 6 2 2 6
P 4 9 3 3 9
P 1 0 6 6 5
Q 0 3 3 5 1
P 1 9 5 2 5
P 3 9 9 6 8
P 2 1 8 6 0
Q 0 9 5 3 7
P 1 3 2 4 4
P 1 3 1 8 6
P 4 6 5 3 0
P 4 3 5 3 5
P 3 8 3 6 1
P 2 4 7 2 3
Q 0 1 7 0 5
P 0 5 8 1 3P 2 9 5 9 8
P 4 2 6 8 6
P 3 6 3 1 4
P 0 3 6 4 1
P 4 0 5 5 0
P 4 7 7 3 5
P 0 9 5 9 9P 2 5 6 9 3
Q 1 0 0 7 1P 2 7 9 6 6
Q 0 3 3 6 4
P 0 9 2 1 5
P 3 4 3 6 9
P 8 0 4 1 2
P 0 6 2 4 5
P 3 3 4 9 7
P 1 8 6 5 3
P 2 4 5 8 3
P 0 0 5 1 6
P 2 3 3 3 9
P 0 4 1 8 5
Q 0 0 3 4 2
Q 0 1 8 8 7
P 0 9 9 8 9
P 0 6 6 2 5P 3 7 3 0 5
P 2 3 4 4 3
Q 0 4 9 7 6P 3 8 4 3 8
Q 0 2 1 5 6
P 1 6 6 7 1
P 2 5 3 8 9
P 0 3 6 4 3
P 4 5 8 9 4
Q 0 5 9 9 9
P 4 8 7 4 9
P 2 6 7 7 9
P 0 7 6 0 2
P 3 5 5 1 3
P 3 3 5 3 0
P 3 4 2 4 4
P 0 8 4 1 3
P 2 3 4 5 8
P 3 6 9 7 8P 3 2 6 1 2
P 4 7 7 0 9
P 3 6 0 9 5
P 3 3 4 0 2
P 0 6 8 4 5
P 3 4 7 1 1
Q 0 6 8 4 6
P 0 3 9 6 7
P 1 8 2 9 3
P 3 8 4 3 2
P 1 3 6 7 7
P 2 5 3 2 1 P 4 0 0 6 1
P 3 9 0 0 0
P 1 3 5 9 6P 1 3 5 9 5
P 0 4 4 0 9
P 4 2 2 8 2P 4 7 9 8 7
P 0 6 7 8 2
Q 0 7 0 0 5P 4 4 6 0 2
P 4 0 4 2 4
P 2 2 9 8 5
P 2 1 1 4 6
P 2 9 5 9 7
P 1 8 6 1 2
P 2 9 3 1 7
P 2 4 5 7 1
Q 0 2 2 1 6
P 3 4 1 5 2
Q 0 5 5 1 3 P 3 4 9 4 7
P 3 8 9 7 0
P 2 1 7 0 9
P 2 8 6 9 3
Q 0 2 7 6 3
P 2 0 8 0 6
P 3 4 8 9 2
P 0 5 6 2 2
P 0 8 4 1 4
P 3 5 5 9 0
Q 0 6 8 0 6
P 2 6 6 1 9
Q 0 6 8 0 5
P 0 9 6 1 9
P 1 6 1 4 4
P 1 8 7 6 0P 4 6 3 7 8
P 1 1 2 7 6
P 1 3 3 6 8
Q 0 9 0 1 4P 2 9 3 6 6
P 1 8 1 6 8 P 3 4 0 2 4
P 4 5 7 2 3
P 0 9 3 3 3
P 0 6 5 4 6
P 4 0 9 9 6
Q 0 1 4 8 1
P 2 7 4 4 6
Q 0 9 7 4 6
Q 0 1 2 2 5P 3 5 8 2 7
P 2 7 4 0 0
P 0 8 6 3 0P 4 6 3 6 0
P 1 5 4 9 8
P 4 8 0 2 5
P 0 0 5 1 9
P 0 8 6 7 4
P 0 0 7 1 9
P 0 8 6 7 3
P 0 0 7 1 7
P 1 2 9 8 0P 2 7 7 9 2
P 0 7 3 0 7
P 2 4 7 2 1
P 1 3 8 1 4
P 2 0 4 8 9P 1 2 3 1 9
P 0 7 8 9 7
P 1 1 4 4 4
P 2 1 2 8 8
P 0 2 8 9 3
P 1 6 1 1 2
P 3 4 9 2 7
P 0 2 8 7 4Q 0 7 2 5 2P 2 7 1 1 3
P 3 2 7 3 9
P 0 8 3 1 8
P 0 8 6 9 1
P 3 9 4 1 4
P 4 6 5 5 6
P 0 2 6 9 4
P 1 1 7 9 8
P 0 0 2 6 2
P 2 6 8 1 8P 3 4 9 2 5
P 2 0 6 1 3Q 0 6 0 6 0
Q 0 8 9 4 2
P 0 6 2 4 4
P 0 7 9 3 1
P 1 6 3 8 5
P 3 2 3 6 1
P 0 3 2 1 9Q 0 0 7 0 1
P 4 2 9 0 6
P 2 7 5 1 4P 4 2 9 8 3
P 3 5 9 0 0
P 1 5 8 0 0
P 2 5 9 5 2
P 1 5 7 1 0
P 0 2 5 3 8
P 1 2 8 3 9
P 3 7 2 7 4
P 0 3 1 7 2
P 2 9 3 5 2
P 3 7 0 0 5 P 3 5 2 3 3
P 2 8 8 2 7
P 3 4 4 4 2
P 0 5 9 9 0
P 1 7 4 7 4
P 2 3 4 6 9
P 3 3 8 1 1
P 3 2 7 9 0
P 4 9 4 4 6
P 3 5 8 2 1P 1 8 0 3 1
P 4 9 4 4 5
P 2 6 0 4 0P 1 9 9 2 4
P 1 7 7 0 6
P 1 0 5 8 6
Q 0 5 9 0 9
P 1 8 0 5 2
P 2 3 4 7 0
P 2 3 4 6 7
P 4 8 6 1 1
P 4 2 8 6 6
P 2 9 3 6 7
P 1 0 0 3 9
P 2 4 8 2 1
P 0 4 9 3 7
P 4 9 6 8 4P 3 0 9 8 9
P 2 0 6 0 7
Q 0 6 8 0 7
Q 0 3 6 9 6
Q 0 9 7 3 4
P 2 5 1 8 4
P 3 3 0 0 5
P 2 3 3 5 2
P 3 8 0 4 7
P 3 5 3 3 1
P 2 1 7 9 5
P 0 2 6 7 9 P 0 2 6 8 0
P 1 2 7 9 9
P 0 8 0 2 5
P 0 4 5 0 1
P 4 6 1 1 0
P 3 6 8 4 4
Q 0 8 2 6 4
P 0 8 1 0 3
P 4 3 4 0 4
P 4 2 6 8 4
P 9 8 0 8 3P 4 6 1 0 8
P 1 6 9 1 1
P 2 4 1 3 3
P 0 0 5 3 0
P 2 7 8 7 0
P 3 5 9 9 1
P 2 0 9 9 9
P 1 5 7 9 0P 0 8 6 3 1
P 4 0 2 3 0
P 1 3 1 8 7
P 3 7 1 7 3
P 1 3 4 0 6
P 4 3 4 0 3
P 2 3 2 9 2
P 3 9 9 6 2
P 0 4 2 7 8
P 3 1 1 3 5
P 1 5 2 1 6
P 3 6 1 3 5
P 1 6 2 3 0Q 0 1 4 0 6
P 4 2 6 9 0P 1 5 0 5 4
P 2 3 3 2 7
P 4 5 5 3 9
P 4 2 6 8 1
P 9 8 0 6 4
Q 0 2 2 0 7
P 0 8 1 5 9
Q 1 0 0 5 9
P 4 2 2 2 8P 4 2 8 2 5
P 1 7 6 9 2
P 2 7 0 5 8P 1 6 7 3 2 P 0 6 5 4 7
P 4 8 2 0 7
P 4 8 2 0 8P 2 6 4 9 5
P 3 5 2 3 2
Q 0 9 8 0 3
P 4 3 6 4 4
P 3 5 0 8 5
P 3 5 1 9 1
Q 0 5 5 2 8
P 3 1 6 1 7
P 0 6 8 4 7
P 3 8 0 8 5P 4 0 5 6 4 Q 0 0 3 3 8
P 0 5 1 7 4
P 4 4 9 5 5
P 2 7 0 0 0
P 4 5 0 5 9
P 4 5 1 6 1
P 2 8 8 6 8P 3 8 5 1 3
P 4 9 0 8 3
P 1 6 9 5 4P 0 9 2 3 1
P 2 9 1 2 0
P 4 2 0 8 7
P 0 5 4 0 7P 2 5 3 5 3P 4 2 7 3 1
P 0 3 8 0 3
Q 0 9 1 7 5 P 0 0 0 0 8
P 4 0 9 0 2
P 4 9 0 8 2
Q 0 6 3 1 7
P 3 0 5 3 8
P 1 0 3 4 2
P 1 3 1 3 4
Q 0 8 0 4 7
P 2 5 3 0 6
Q 0 7 0 0 9P 0 7 2 8 4
P 0 8 0 4 9
P 0 8 1 4 4
P 3 8 5 3 6P 1 0 5 2 9
P 1 9 2 6 9
P 1 6 2 4 6
P 2 8 9 6 9
P 3 8 7 0 5
P 4 0 7 6 3
Q 0 2 5 7 2
Q 0 9 9 2 8
P 2 3 6 2 2P 2 1 9 4 4
P 2 7 2 5 9P 2 5 6 8 5
P 1 4 9 8 0
P 2 8 8 2 1 P 2 8 3 0 5
P 2 0 9 6 5
P 1 3 5 8 3
P 2 2 9 9 7
P 3 6 0 5 1
P 3 1 9 0 9
P 1 1 9 0 8
P 3 3 0 4 6
Q 0 6 5 4 7
P 0 9 9 4 3
P 3 9 1 3 7
P 0 6 3 8 5P 0 7 1 3 3
P 0 6 3 8 7
P 3 6 3 1 7
P 3 6 2 1 2
P 1 7 0 7 9
P 4 9 5 5 7
P 4 1 1 2 8
P 3 4 7 5 0
P 3 1 4 3 4
P 4 7 1 5 4
P 4 2 7 8 9
P 4 1 1 0 9
P 1 9 3 3 2
P 4 1 9 3 1
P 3 3 6 4 1
P 3 2 4 8 5
P 3 4 3 0 5
Q 0 9 6 8 7
P 1 6 5 1 9
P 3 3 1 1 7
P 3 0 6 7 9
P 2 4 2 2 8P 0 9 5 7 0
P 4 6 1 5 3 P 4 2 8 9 1
P 0 6 1 3 4P 2 3 7 7 3
P 3 7 0 6 2
P 3 9 4 7 3
P 3 8 5 0 8
P 3 4 2 8 6 P 3 8 9 3 8
P 4 2 4 3 4
P 4 5 7 5 7 P 0 7 1 4 2 P 2 6 5 1 1
P 4 4 8 0 1P 0 7 1 4 3P 4 5 7 7 8P 1 8 8 7 1 P 3 0 1 8 3
P 0 6 3 8 0
P 3 9 0 9 7P 1 9 3 2 8
P 1 4 0 1 0
P 0 8 9 1 3
P 1 0 7 5 8
P 2 2 7 3 1
P 0 1 0 2 6
P 1 0 8 5 6
P 4 5 3 6 4
P 4 3 6 2 9
P 3 9 9 9 7P 3 9 8 4 4
P 1 1 9 4 0P 0 8 0 1 8
P 2 0 5 0 6
P 0 4 9 6 8
P 1 2 6 8 0P 0 8 5 3 9
P 4 0 5 9 1
P 4 1 3 9 4P 1 3 6 2 7P 3 1 7 8 0P 1 5 2 0 6Q 0 3 1 8 8P 0 7 8 2 6
P 4 5 8 5 6P 4 5 6 0 9P 4 5 8 8 9P 4 8 5 1 1 P 0 6 6 8 4
P 2 3 3 7 2 P 1 2 6 2 7 P 1 5 3 3 3 P 2 9 0 3 7 P 4 3 1 5 7 P 2 5 7 8 1 P 1 6 3 8 1P 4 4 9 7 1P 3 6 9 5 8
P 1 3 5 0 7P 1 1 4 1 0P 0 0 7 8 0
P 3 7 9 8 6
P 0 7 2 9 6 Q 0 6 9 8 7
P 2 6 7 1 9
P 4 5 1 0 0
P 1 2 0 2 3
P 0 3 2 8 4
P 3 3 9 8 5 P 2 7 9 2 8 P 2 8 7 7 2 Q 0 0 5 1 1
P 0 7 2 4 4
P 4 3 9 0 9
P 3 8 5 3 0
P 0 4 3 9 1
P 2 1 2 0 3Q 0 5 8 6 6
P 3 8 5 3 2 P 3 2 6 7 5
Q 0 0 6 1 3P 2 7 5 4 2
P 1 4 5 6 0 P 4 0 8 7 9P 1 3 0 1 5
P 4 6 7 4 0P 3 3 9 8 4
P 4 5 3 8 0
P 2 7 1 2 0 P 4 9 1 5 5 P 1 3 0 4 5 Q 0 6 5 6 2 P 3 8 9 7 1 P 2 7 6 9 3 P 4 6 0 7 3P 4 8 8 4 8P 4 0 8 5 1
P 0 7 8 9 6Q 0 8 4 2 6Q 0 1 8 3 3P 0 1 0 2 3P 1 4 2 0 6 P 1 4 0 4 6
P 1 4 2 5 1
P 1 5 3 3 2P 3 3 8 6 4P 1 6 8 5 5 Q 0 9 0 5 6P 2 1 0 4 9
P 4 9 3 6 9P 1 5 0 8 2P 4 9 6 4 2 P 1 6 7 0 0
P 3 3 8 6 5
P 0 0 6 0 6P 3 7 4 4 7
P 4 5 0 1 8
P 3 3 2 8 2
P 2 2 1 4 6 P 3 7 4 4 6
P 2 8 8 2 5
P 3 3 0 0 7 P 2 0 6 1 8
P 4 3 6 2 6
P 4 9 4 5 4
P 3 0 2 7 6
P 0 8 6 6 4
P 4 1 7 7 2
P 3 0 3 4 1
P 3 0 5 4 5P 1 7 3 4 9
P 3 4 1 7 9
P 1 5 8 2 3
P 3 7 0 6 1
P 0 8 5 6 8 P 4 1 1 5 1P 1 8 1 8 6 P 0 6 9 6 0
Q 0 8 2 7 6 P 4 1 1 5 2P 0 0 4 8 0
P 4 7 2 0 0 P 4 7 4 2 4P 1 1 0 6 6
P 3 8 0 8 6P 4 2 0 7 2
P 2 1 8 7 9
P 4 6 6 8 1
P 3 1 0 0 2
P 4 5 0 8 5
P 0 9 7 9 3
P 3 0 8 5 1
P 3 9 5 6 7
P 0 4 0 9 0
P 4 7 9 3 2
P 1 9 8 3 8
Q 0 0 6 5 3
Q 0 4 8 6 1
P 3 2 6 7 2
P 3 2 1 5 4
P 2 6 3 8 1P 3 8 6 9 7
P 0 4 8 0 8
P 2 3 7 3 9
P 0 0 6 8 9
P 0 7 7 6 8
P 2 6 3 8 2
P 3 8 6 2 8
P 4 4 9 0 0
P 2 9 0 4 1P 0 7 0 7 5
P 4 3 7 5 3P 3 9 4 0 9
P 0 9 0 5 7
P 3 9 4 7 4
P 2 7 1 2 1
P 4 8 8 2 8P 2 0 7 2 4
P 1 7 5 4 2
P 3 7 3 8 8
P 1 9 6 6 9
P 0 3 0 7 2
P 4 4 9 4 6
P 4 3 6 9 3
P 2 2 0 9 1 P 4 4 3 8 7
P 4 0 4 2 9
P 0 5 7 4 8
P 1 5 7 7 2
P 0 8 0 5 5
Q 0 8 3 6 9 P 4 3 6 9 4
P 4 8 9 8 3
P 1 5 9 7 6
P 1 8 6 4 0
P 3 7 0 2 0
P 0 8 0 8 1
P 2 6 2 3 1
Figure 5.9. The Protein network using only non-circular patterns
Red nodes are nodes found in the non-CP network. Green nodes are nodes found only in theCP-network. Blue edges denote edges found in the CP network. Pink edges are edges foundonly in the CP network.
CHAPTER 5. CIRCULAR PATTERN MATCHING 129
P 4 5 6 3 8
P 4 8 4 0 9
P 0 1 1 0 6
Q 0 6 2 3 4P 4 0 3 3 9Q 0 5 2 1 5
P 0 7 2 6 8
P 2 6 6 3 2
P 1 5 2 7 0
P 1 0 0 7 1
Q 0 6 8 8 9
P 0 0 4 5 1
Q 0 8 3 4 5
P 4 3 8 9 0
P 0 7 1 6 7
P 3 2 2 6 4
P 4 3 7 6 3
P 3 2 7 7 0
P 3 8 6 9 0
P 1 4 5 5 0
P 4 3 7 9 9
P 3 8 4 8 8
P 2 7 0 9 0
P 1 6 0 4 7 Q 0 6 1 4 5P 1 8 1 4 6
P 4 6 6 8 4
P 3 4 4 5 5
P 1 7 1 2 5P 3 4 8 2 1
P 1 1 3 6 9
P 1 3 4 9 7
P 1 6 1 7 6
P 2 3 3 5 9
P 2 1 2 7 4P 3 0 8 8 4
P 1 8 0 7 5
P 3 3 2 9 7
P 3 2 4 6 8
P 3 3 7 6 0P 4 6 5 0 2
P 3 3 2 8 9
P 4 4 3 2 5
P 4 3 0 2 7
P 0 7 2 0 0
P 1 7 9 8 0
P 4 9 0 0 3
P 0 3 3 3 6P 0 3 3 4 5
P 2 0 0 1 4P 3 9 7 7 0
P 2 5 4 9 0
P 0 4 3 2 3
Q 0 4 5 7 4
P 2 7 1 0 6P 3 5 4 2 8
P 0 6 2 9 5
P 4 6 5 9 2
Q 0 2 3 6 3
P 2 2 8 1 6
P 1 7 9 2 0
P 1 5 1 7 3
P 2 4 7 9 3
P 1 3 0 9 7 P 1 7 6 6 7
P 4 6 4 6 6
P 4 6 4 7 0
P 3 2 7 6 3
P 4 1 8 3 6
P 2 4 6 9 5
P 1 7 6 0 0
P 2 6 8 2 1
P 2 1 3 0 2
P 4 9 4 8 1
P 4 3 1 5 7P 0 6 0 1 2
P 3 2 1 3 1P 4 1 5 5 6
P 0 5 0 2 0P 1 4 3 0 8
P 4 2 1 8 8
P 3 8 0 6 0
P 2 7 2 5 9
P 4 4 5 7 8Q 0 6 9 4 5P 3 8 7 5 6 P 0 7 3 3 7
P 3 0 5 3 8 P 9 8 0 7 4
P 3 5 0 3 6
P 1 0 3 4 2
Q 0 8 0 4 7
P 0 4 9 6 8P 4 3 6 2 6
P 0 8 9 5 5
P 1 5 4 0 2 P 3 8 6 9 7
P 2 0 5 0 6
P 4 8 4 1 9
Q 0 5 0 0 1
P 4 6 4 6 5
P 1 8 4 0 8
P 4 8 4 3 6
P 1 2 8 3 0
P 2 2 0 3 7
P 1 1 7 1 8
P 4 7 6 9 5
P 3 2 7 9 5
P 1 7 3 2 6
P 3 0 7 1 4
Q 0 8 4 3 6
P 2 5 0 4 9
P 0 6 6 5 4
P 0 8 0 1 2
P 2 4 3 8 4
P 3 2 1 1 3
P 3 1 6 5 0
P 3 1 6 4 7
P 0 8 0 7 8P 4 8 9 8 4
P 0 8 4 2 4
P 2 7 0 3 5
Q 0 2 3 4 3
P 4 9 0 8 3
P 2 8 3 4 0
P 0 2 9 1 9
P 3 8 0 3 9
Q 0 6 5 1 8
P 2 9 4 7 6
P 1 6 6 0 3
P 2 1 0 3 6
P 2 9 4 7 4
P 3 7 1 1 6
P 4 5 3 5 8P 4 3 7 3 9
P 0 7 3 9 2
P 2 2 7 0 4
P 1 6 2 4 6
P 0 9 2 5 9
P 4 7 5 8 2
Q 0 8 4 0 0
P 1 2 0 9 3
P 1 3 1 2 2
P 3 5 8 8 8P 2 7 7 4 3
P 0 7 3 9 5P 1 1 8 3 1
P 3 5 8 9 0
P 1 0 3 7 8
P 1 9 8 2 8
P 2 8 3 6 5
P 4 8 6 3 3P 1 7 5 4 6 P 1 8 6 8 0P 2 2 1 3 9
P 4 0 8 7 2
P 1 3 8 4 6
P 0 5 6 6 4
P 2 8 8 6 8
P 0 8 5 3 9Q 0 6 3 1 7
P 4 5 0 5 9P 3 0 6 7 9
P 4 5 6 0 4
P 4 5 1 6 1 P 3 9 8 4 4
P 2 8 3 3 9
P 0 4 2 9 2
P 2 8 8 5 7 P 2 5 4 6 4
P 2 7 7 4 2
P 0 3 6 8 0
P 1 7 3 9 3
P 1 7 1 9 2P 3 0 3 2 0
P 0 7 9 1 7P 2 1 4 0 2
P 4 9 0 8 2
P 2 6 2 0 7P 3 1 2 4 2P 2 7 7 9 2
P 3 4 9 2 7P 0 2 8 9 3
P 1 2 3 1 9P 2 0 4 8 9
Q 0 8 4 6 9
P 1 2 9 8 0
P 0 0 7 1 9
P 4 6 5 5 6
P 1 3 8 1 4
P 3 2 7 3 9
P 3 1 6 4 6
P 0 2 8 7 4
P 0 8 6 9 1
P 0 7 3 0 7
P 1 6 1 1 2
P 0 6 4 5 7
P 3 6 7 4 0
P 1 6 2 0 8
Q 0 7 8 6 1
Q 0 0 9 4 2
P 0 7 0 6 5
P 4 8 3 7 1
P 1 5 3 4 8
P 0 3 1 0 9P 4 3 9 7 9 P 3 4 3 0 5
P 4 7 4 2 7
P 0 2 8 5 8
Q 0 9 8 8 2
P 2 2 6 7 0
P 4 5 0 1 8
Q 0 9 1 7 2P 2 1 5 5 1
P 3 4 2 2 1
P 4 9 4 4 4P 4 9 3 3 7
P 2 1 5 5 2P 2 3 7 7 3
Q 0 8 3 6 9
P 0 3 0 4 2
P 1 4 1 1 0 P 4 3 6 9 4
P 4 5 2 7 4
P 1 4 7 4 9
P 2 5 4 0 4
Q 0 7 8 6 8
P 4 5 3 4 5
P 3 5 9 0 0
P 0 8 7 7 8
Q 0 3 4 1 6
P 2 7 7 5 1
P 3 1 3 3 4P 1 5 2 1 5 P 2 9 3 8 4
P 2 7 2 0 6P 2 5 1 0 4
P 3 3 3 9 6
P 3 0 2 6 8
P 2 2 8 0 5
P 0 7 9 4 4
P 3 8 7 7 6
Q 0 4 7 0 7
P 2 0 7 9 7
P 4 2 9 7 1
P 4 0 7 5 0
P 0 2 5 3 4
P 0 4 2 6 4
P 0 8 5 5 1
P 1 6 0 5 3
P 0 9 0 0 1P 1 1 5 2 2
P 1 5 8 0 0
P 1 8 6 6 5
P 3 6 5 0 1P 0 2 5 3 8
P 1 8 5 2 0
P 2 5 6 9 1 P 4 9 4 0 4
P 0 8 7 7 9
P 2 0 6 0 7
P 4 8 4 5 9
P 4 8 4 5 2
Q 0 4 7 2 6
P 4 2 0 0 0
Q 0 0 1 8 4
P 4 8 4 8 3
P 4 9 1 7 8
Q 0 8 2 0 9
P 1 4 7 4 7
P 2 3 6 7 8
P 1 1 4 6 1
Q 0 2 0 4 0
P 3 9 1 0 2
P 2 0 3 1 0
Q 1 0 0 8 7Q 0 2 4 1 3
P 2 8 7 1 3
P 3 3 1 5 1
P 2 8 6 2 1
Q 0 1 9 3 1
P 0 3 9 6 7P 4 7 7 0 9
P 3 5 9 9 1
P 0 9 3 3 3
P 0 5 1 2 9
P 2 4 5 0 7
P 4 6 9 3 4
P 1 7 6 5 8
P 3 2 8 9 2Q 0 9 7 2 7
P 3 3 9 0 6
P 2 5 8 0 8
P 1 6 9 8 9
P 2 0 4 4 8
P 0 6 6 3 4
P 2 6 8 0 2
P 4 4 5 2 6
P 4 9 4 4 5
P 2 3 4 6 9
Q 0 5 9 0 9P 2 3 4 6 7
P 1 0 5 8 6
P 1 8 0 5 2
P 2 6 0 4 5
P 2 3 4 7 0
Q 0 1 2 0 6
P 4 2 6 9 4
P 1 5 4 2 4
P 2 4 7 8 2
Q 0 6 1 8 0P 1 7 7 0 6
P 4 8 6 1 1
P 1 8 0 3 1
P 3 5 8 2 1
P 4 9 4 4 6
P 0 5 9 9 0
P 2 9 3 5 2
P 3 7 0 0 5
P 2 6 8 4 9P 3 4 8 5 2
P 0 5 5 1 0
P 1 5 5 5 9
P 0 4 5 4 0
P 3 4 8 5 5
P 2 9 9 1 3
P 2 9 8 0 1P 4 0 0 1 0
P 4 1 2 9 8
P 1 5 5 8 2
P 1 5 5 8 1
P 2 2 0 0 2
P 2 2 3 1 6P 1 3 8 0 6
P 0 8 1 0 4
P 0 4 7 7 5
P 3 7 0 8 8
Q 0 9 6 8 3P 3 8 1 4 4P 1 0 8 7 0
P 3 2 6 3 9
P 0 4 8 0 0
P 4 0 7 9 8
P 2 2 0 8 2
P 4 9 1 7 7
P 0 7 0 6 1P 3 2 4 8 1
Q 0 6 1 1 0
P 1 1 4 9 0
P 2 0 4 5 9
Q 0 9 7 1 5P 4 7 0 2 5
P 1 4 7 8 7
P 0 4 2 8 1
P 1 7 0 1 0
P 1 7 9 7 0
P 4 0 1 1 8P 2 2 0 0 1
P 2 5 1 2 2
P 8 0 1 5 1
Q 0 5 0 3 7
P 1 4 6 6 6
P 2 9 0 7 2
P 1 5 5 6 4
P 4 8 9 1 5
P 0 6 2 6 2 P 3 1 9 7 1
P 3 4 8 5 4
P 2 6 8 4 6
P 2 4 8 8 0
P 1 3 9 5 9
P 2 9 3 9 3
P 0 7 9 4 9
P 1 6 6 4 9
P 2 5 3 8 7
P 4 1 8 1 0
P 4 3 0 8 8
P 1 0 4 7 5
P 0 7 9 8 7P 0 4 9 5 6
P 1 4 0 9 0
P 2 3 6 6 5
P 1 9 4 2 4
P 2 5 4 7 2
P 1 6 2 1 6
P 3 9 5 2 4
P 2 2 1 8 9
P 0 9 9 7 5
P 4 8 4 3 2
Q 0 9 8 9 1
P 9 8 0 5 6
P 2 5 9 8 0
P 1 0 9 5 5
P 0 5 0 3 0
P 2 3 6 3 4
P 0 6 8 4 7P 0 5 1 7 4
P 2 8 5 7 0
P 2 3 9 7 7
P 3 1 6 5 2
P 1 3 5 4 7
P 4 0 5 2 7
P 1 3 5 8 6
P 2 2 7 0 0
P 3 1 6 6 2
P 0 8 0 1 8
P 2 4 2 2 8
P 4 4 7 2 1
P 2 6 9 2 7
P 4 2 1 8 7
P 0 9 3 2 3P 4 0 1 9 3
P 1 6 9 5 4
P 1 1 0 4 7
P 2 1 2 8 8
P 1 3 4 7 3P 0 8 6 7 3P 2 4 7 2 1
P 3 6 4 9 9P 0 8 6 7 4
P 0 0 7 1 7
P 3 7 0 8 9P 1 9 3 3 2P 4 1 5 8 6
P 0 7 2 0 4
P 3 7 8 8 9P 3 3 7 5 1
P 0 0 7 4 2P 3 7 0 9 1P 0 8 1 4 4
Q 1 0 0 5 9P 2 1 9 7 7
P 0 7 8 6 1
P 3 8 9 3 9
P 0 6 5 4 7P 2 6 5 0 3
P 1 4 5 4 3
P 9 8 0 9 5
P 2 7 6 7 6
P 4 2 1 9 9
P 4 1 1 0 9
P 1 0 4 9 3
P 4 6 5 1 9
Q 0 1 2 7 9
P 2 1 9 4 1
P 3 9 1 3 7 P 1 2 2 1 9
P 3 1 5 6 2 P 3 2 2 1 5
P 3 4 4 1 0
Q 0 0 7 6 1P 2 0 7 1 7Q 0 7 0 0 9
P 3 3 0 4 6
P 1 7 6 9 2
P 0 8 0 4 9
Q 0 1 6 4 7
Q 0 6 9 2 7P 2 0 0 0 0
P 0 0 0 0 8P 4 7 7 2 8
P 1 0 5 2 9P 3 8 5 3 6
P 4 5 0 4 5 P 1 3 2 2 6
P 3 9 7 4 8
P 4 0 8 2 5
P 4 1 3 4 3
P 0 0 4 5 5
P 2 5 6 2 1
P 3 9 8 7 5 P 3 6 5 9 2
P 0 0 9 5 6
P 4 6 3 2 9
P 2 8 0 3 7
P 2 5 5 0 2
P 1 3 0 3 6P 2 1 8 9 0
P 1 0 9 3 3
Q 0 0 4 2 0Q 0 6 5 4 7
P 4 5 0 4 8
P 4 2 0 8 7
P 4 0 9 0 2 P 1 2 2 1 7 Q 0 8 6 4 2 P 3 3 6 4 1
P 1 1 6 5 7
Q 0 8 7 8 8P 0 3 0 0 4
P 1 1 4 5 4P 3 4 0 2 8
P 3 5 2 3 6
P 3 0 3 1 8
P 3 3 7 6 8P 1 1 7 0 5
P 4 6 3 1 7
P 2 6 0 4 6
P 3 6 9 5 9P 2 2 5 9 1P 3 2 1 9 8
P 0 9 4 9 8
P 1 8 6 2 6
P 4 6 5 3 7P 1 7 7 7 9 P 3 1 6 2 3
P 1 2 8 9 4
Q 0 4 5 7 5
P 3 1 0 0 2
P 2 1 8 7 9
P 1 7 9 6 5
P 3 1 3 0 1
P 2 5 9 9 5P 0 7 2 5 9
P 0 2 6 6 2
Q 0 5 2 3 7
P 0 7 1 2 4
P 4 0 7 5 7
P 4 6 5 3 8
P 2 2 7 3 1
P 4 3 6 2 9
P 2 3 1 3 2
Q 0 2 4 7 3
P 1 8 2 9 2
P 1 0 2 5 3P 4 6 6 0 5
Q 0 3 0 3 0 P 1 4 1 5 1
P 0 5 3 4 2
P 9 8 1 3 1P 4 8 6 0 1
P 9 8 0 7 3
P 3 5 0 3 7
Q 0 9 6 9 0
P 4 8 8 2 5
P 2 1 8 3 8
P 1 3 0 6 8
P 3 5 9 5 6
Q 0 8 2 8 9
P 2 3 7 5 9P 1 6 0 2 5
P 4 2 4 8 6
P 2 0 5 0 4
P 4 4 4 2 3
P 1 3 1 2 1
P 4 8 1 2 0
P 3 7 8 7 1
P 2 2 9 9 7 P 4 9 4 5 2 P 1 1 7 1 1
P 2 8 3 0 5 P 4 2 7 3 1
P 3 1 4 7 5P 1 5 2 0 6
P 2 8 8 2 1 Q 1 0 0 5 7 Q 0 1 4 1 5 P 1 6 3 0 4
P 1 1 9 4 0
P 2 0 8 5 2 P 4 6 3 5 8
P 2 0 9 6 5
P 4 9 4 5 4 P 8 0 0 5 9
P 1 3 0 4 5P 3 6 2 1 3P 4 3 9 0 9P 0 0 5 7 1P 4 5 0 7 7P 4 0 8 0 1P 1 0 6 1 2P 3 0 2 7 6
P 3 4 8 9 4Q 0 8 4 8 0P 3 2 2 5 7
P 3 0 4 3 8
P 2 1 2 0 3
P 0 0 4 1 5Q 0 0 7 6 3
Q 0 6 5 6 2P 3 6 6 1 9
P 2 6 2 5 7 P 2 1 9 9 9
P 1 7 9 2 3
P 4 3 3 3 4
P 2 3 8 1 5
Q 0 2 8 6 9P 2 8 2 7 2
P 4 6 9 7 3
P 2 7 7 8 3
P 0 3 4 2 5
P 3 6 8 3 7
P 2 5 4 6 8P 0 4 1 7 6P 0 8 6 5 1
P 1 5 6 4 5
P 4 1 4 4 1 P 0 6 1 3 4
P 3 6 8 3 6
P 1 8 8 7 1
P 4 6 0 5 9
P 1 5 8 2 3
P 2 8 5 9 4
P 1 2 8 8 0
P 2 2 9 4 5P 0 0 8 6 4
P 9 8 1 3 6 P 4 8 7 4 7 P 0 3 9 5 8
P 3 1 7 4 3
P 0 6 6 6 5 P 0 8 9 1 3
P 1 3 7 9 4 P 3 7 7 2 6
P 4 2 8 9 1P 3 1 7 0 5
P 4 7 8 4 6P 2 0 7 2 4P 2 6 6 8 3P 0 9 0 5 7P 0 5 7 9 4
P 4 3 4 7 1
Q 0 7 8 0 1
P 0 9 9 5 0
P 1 6 7 3 2P 3 4 9 4 9
P 1 4 6 3 9P 4 3 6 5 2P 0 0 9 4 6P 1 8 1 8 4P 2 7 1 2 1P 2 7 1 2 0
P 1 6 1 2 6
Q 0 5 7 6 3
P 0 7 8 6 0 P 2 9 9 5 1 P 1 6 3 9 7 Q 0 3 1 5 6
Q 0 5 1 1 3P 3 4 2 0 3Q 0 4 6 1 9P 4 6 9 7 6
P 3 3 9 0 9P 4 9 6 4 2
P 1 3 2 8 0
P 0 2 8 2 8
P 4 6 2 0 8P 3 3 9 0 5P 1 4 6 2 5
P 4 0 3 9 1
P 1 9 2 6 9
P 4 6 4 5 5
P 3 1 0 5 3
P 2 7 4 2 2
P 0 9 7 9 3
P 3 7 0 7 5
P 4 7 3 1 5
Q 0 9 6 8 7
P 3 2 1 3 8
P 4 8 8 4 1 P 1 2 6 8 4
P 1 6 3 9 3
P 0 4 0 3 5 P 3 2 4 4 0
P 4 7 3 5 9
P 3 9 9 9 7 P 2 6 3 7 9
P 1 7 5 9 9
Q 0 2 3 9 4
P 4 3 9 2 2
P 1 0 7 2 3
P 2 3 7 3 9
P 0 0 6 8 9
Q 0 2 0 7 8
P 4 2 0 7 2
P 4 4 9 4 6 P 0 0 5 3 3
P 4 8 3 7 2
P 4 6 1 5 3
P 1 2 2 5 6
P 4 2 2 0 9
P 1 5 9 7 6
P 1 4 0 1 0
P 4 9 3 4 0 Q 0 9 1 7 3 P 0 9 7 4 5 P 3 5 2 2 1 P 4 4 8 6 2 P 2 4 3 5 8
P 4 3 6 9 3
P 2 8 6 6 1
P 1 3 2 1 3 P 2 6 2 3 1
P 4 0 5 9 9 P 4 3 8 5 1 P 1 7 5 0 2P 4 9 6 9 7
P 3 3 7 2 5P 3 1 8 1 3
P 2 6 1 7 7
P 0 6 1 0 7P 0 0 4 6 1P 4 2 0 4 2P 1 5 5 6 7 P 1 6 3 1 6P 4 2 0 3 4 P 3 3 0 6 8
P 3 5 4 9 4 Q 0 2 9 3 4
P 4 5 4 3 8
P 0 2 9 3 6
Q 0 6 4 5 8
P 4 9 5 9 1P 0 8 2 0 1P 4 3 4 3 3
P 3 5 6 3 3
P 3 6 6 0 9
Q 0 0 4 9 5
P 0 7 5 7 2P 3 6 6 0 8
P 2 1 4 5 7P 4 5 8 8 9
P 2 5 7 8 1
P 4 6 0 6 5P 1 6 7 0 0
P 1 2 0 4 0 P 2 2 1 0 2
P 2 1 8 7 2P 0 7 2 4 4
P 0 0 9 6 7
P 4 6 6 9 1 P 4 4 5 2 0 P 2 3 5 9 6 P 0 8 0 1 7 Q 0 2 4 3 1 P 4 9 4 6 5 Q 0 3 5 8 6 Q 0 0 0 1 3P 3 1 0 0 7P 2 4 1 9 3
P 4 3 6 4 4
P 4 8 2 0 8
P 4 3 6 3 4
P 4 2 9 4 3
P 0 3 1 0 5
P 4 8 5 6 1
P 2 5 3 0 6
P 3 8 5 1 3
P 2 1 3 7 6
P 0 6 2 8 0
P 3 0 8 7 8
P 2 0 0 2 8Q 0 2 5 8 1
P 3 3 8 1 1
P 2 9 4 7 5
P 1 5 6 8 4
P 2 0 3 9 6
Q 0 4 9 6 0 P 4 2 8 2 5
P 4 3 7 3 5
P 3 5 0 8 5
P 2 5 6 8 5
P 4 0 5 6 4
P 1 9 8 3 8
P 0 1 3 4 7
P 0 4 0 9 0
Q 0 4 8 6 1
P 2 3 4 2 6
P 2 7 9 7 3
P 1 4 6 7 7
Q 0 4 6 9 5 P 1 2 8 3 9
P 2 2 1 3 8
P 1 7 4 7 4
P 3 7 8 9 7P 1 5 5 4 1
P 2 1 9 4 4
P 3 1 6 1 7
P 0 2 9 5 9
P 2 6 7 6 2
Q 0 0 3 3 8P 1 0 0 4 7
P 1 6 4 9 7Q 0 6 8 3 1
P 3 7 8 9 4
P 3 8 0 8 5P 0 7 6 6 6
P 1 1 4 4 4
P 3 1 2 5 1
Q 0 2 0 5 3
P 2 2 3 1 4
P 4 9 3 0 7
P 2 0 9 7 3
P 3 1 6 4 1P 0 7 8 9 7P 3 1 6 4 5
P 2 2 5 1 5
P 2 8 5 7 3
Q 0 8 4 3 5
P 4 1 8 1 1
Q 0 5 0 6 6
Q 0 1 9 5 9P 1 3 5 8 7
P 4 8 0 2 9
P 2 8 5 7 1
P 3 6 1 0 1
P 4 1 2 2 6
P 1 4 0 0 2
P 3 3 3 6 3
Q 0 9 7 6 5
P 3 8 8 2 0
P 4 4 0 2 0P 2 2 5 0 6
P 4 0 4 0 6
P 4 5 2 1 9
P 2 5 7 8 5
P 4 9 3 8 0
P 4 0 6 4 5
P 1 0 8 9 6
P 4 3 3 4 5
P 3 1 3 1 4
P 3 4 6 9 4
P 3 1 8 9 9
Q 0 0 4 6 6
P 1 8 2 6 4
P 0 9 0 2 6
P 2 1 9 5 2
P 4 8 0 3 2P 1 9 4 5 0
P 4 9 5 1 8
Q 0 7 8 6 6
P 2 8 9 7 0P 4 8 2 7 9
P 0 8 7 3 9
P 3 4 5 4 0
P 4 5 9 6 2
P 2 2 3 1 7
P 2 8 0 2 5
P 2 8 3 6 6
Q 0 6 4 6 1
P 4 9 6 4 9P 4 7 8 4 7
P 2 3 4 5 9
P 3 7 8 0 6
P 3 7 9 3 5
P 3 5 0 5 6
P 2 7 6 0 9
P 4 2 4 6 0P 3 1 7 7 7
P 4 8 2 7 5
P 3 7 2 8 5
P 3 7 6 5 0
P 1 4 9 2 2
P 1 5 7 0 5
P 3 1 9 4 8
P 4 4 4 1 0
Q 0 1 1 4 9
P 0 8 0 2 5
P 2 1 7 9 5
Q 0 3 6 9 6
P 3 1 8 9 4
P 2 7 0 3 4
P 0 5 2 2 2
P 2 0 0 2 1
P 0 9 8 0 3
P 3 1 3 6 7
P 2 0 2 6 5
P 2 0 2 6 3
P 0 5 8 2 4
P 0 9 6 2 9
P 1 6 1 4 3
P 3 3 2 1 6
P 2 4 3 4 2
P 2 3 4 6 3
P 1 0 6 2 8
P 4 3 1 2 0
Q 0 8 7 2 7
P 4 9 6 4 0
Q 0 7 9 7 0
P 2 0 6 9 3
Q 0 6 0 6 0
Q 0 2 2 1 6
P 1 9 5 2 5
P 3 8 0 4 7
P 2 5 1 8 4
P 3 4 2 1 6
P 3 3 0 0 5
P 2 3 3 5 2P 2 4 8 2 1
Q 0 9 7 3 4
Q 0 6 8 0 7
P 3 5 3 3 1
P 0 0 8 9 3
P 4 5 8 5 3
P 0 4 1 8 5
P 1 8 1 6 8
P 2 9 3 6 7
P 1 8 7 6 0
Q 0 1 7 0 5
P 1 6 1 4 4
P 0 3 4 3 4
P 4 2 5 8 7
P 4 8 2 4 1
P 2 3 8 1 2P 4 0 5 9 2
P 4 0 7 6 4
P 3 7 2 7 5
P 0 7 5 4 8
P 0 9 6 3 2
P 4 6 6 0 4
P 0 2 8 3 6P 0 2 8 3 2
P 1 4 8 5 8
P 2 0 7 1 9
P 0 9 0 1 5
Q 0 5 4 6 6
P 2 0 2 6 7
Q 0 1 8 6 0
P 0 9 0 7 9
P 1 4 6 5 2P 3 1 2 4 9
P 3 1 3 6 2
Q 0 4 9 9 6
Q 0 3 9 7 4
P 4 9 3 3 5
Q 0 1 6 3 0
P 2 9 8 2 5
P 4 3 6 9 8Q 0 1 2 2 6
P 1 7 9 1 9
P 2 8 7 3 9
P 4 6 8 7 0
P 4 6 8 7 1
P 4 6 8 7 2
P 1 3 6 1 5
P 2 9 5 9 8
P 3 8 7 4 8
P 1 5 2 1 6
P 1 5 3 1 0
P 3 2 3 8 0
Q 0 3 3 9 6P 1 7 7 7 1
P 4 9 1 4 0
P 1 3 3 6 8
P 1 2 7 5 7P 2 4 7 1 0
P 4 8 9 9 8Q 0 2 9 2 6
P 3 5 4 1 6
P 3 5 4 1 8
P 1 0 5 6 9
P 3 5 4 1 7
P 3 8 0 4 1
Q 0 8 0 9 1
P 0 7 1 9 9
P 4 2 3 3 1
P 1 1 4 4 9
P 1 3 5 9 5
P 1 3 5 9 2
P 4 8 9 8 8
P 1 3 5 9 0P 3 5 2 4 8P 0 4 5 0 1
P 4 6 1 1 0
P 0 2 6 8 0P 0 2 6 7 9
P 3 6 8 4 4
P 3 5 7 4 8P 0 8 7 9 9
P 1 6 4 7 7
P 1 9 5 2 4
P 1 4 1 0 5
P 3 2 4 9 2
P 3 5 4 1 5
P 1 9 7 0 6
P 4 6 7 3 5
P 3 6 0 0 6
P 4 2 5 2 2
Q 0 2 4 4 0
P 3 5 5 7 9
P 0 5 6 6 1
P 1 2 8 4 5
Q 9 9 3 2 3
P 3 4 0 9 2
P 4 0 3 1 7
P 3 5 2 4 7
P 1 8 5 4 6
P 0 6 6 8 1
P 1 5 9 2 5
P 1 0 6 4 3P 2 0 9 0 8
P 2 1 1 8 0
P 0 1 0 2 9
P 0 7 8 7 6
P 1 4 2 0 5
P 0 2 7 0 3
P 1 3 5 9 4
P 0 1 1 9 3
P 2 9 4 0 0
P 1 3 5 9 6
P 1 0 0 3 9
P 2 5 8 9 2
P 1 1 2 7 6
P 1 2 0 8 0
P 1 6 6 7 1Q 0 3 3 6 4
P 3 8 3 6 1
P 3 7 1 7 3
P 1 6 3 8 5
P 3 8 4 3 8
P 2 8 6 9 3
P 2 9 3 1 7
Q 0 9 4 3 5
P 1 3 1 8 5
P 4 0 4 2 4
P 2 3 4 3 5
P 4 5 8 9 4
P 0 7 3 1 3
P 4 9 1 3 7
P 2 8 5 4 8
P 4 8 7 4 9
P 0 6 2 4 4
P 3 8 4 3 2
P 4 7 8 1 1
P 4 7 8 1 2
P 3 4 8 9 1P 4 2 2 8 2
P 2 5 3 2 1
P 4 5 9 8 4
Q 0 3 0 4 3
P 3 2 3 2 8
P 4 8 8 1 0
P 2 0 8 0 6
P 3 5 5 9 0
Q 0 6 8 0 6
P 4 6 5 3 0
P 2 5 6 9 3
P 1 0 6 6 5
P 2 2 9 8 5
P 2 1 7 0 9
P 3 3 4 0 2
Q 0 9 0 2 2
P 4 9 1 8 5 P 1 2 5 7 5
P 3 4 9 6 9
P 4 2 5 6 6
Q 0 6 8 5 1 P 0 1 0 0 8
P 4 1 2 3 1
P 1 0 6 0 8P 4 2 2 9 0
P 4 9 6 5 0P 0 7 7 0 0
P 3 2 2 5 0
P 2 1 1 7 8
P 0 2 4 5 8
P 3 3 4 8 4
P 1 4 9 9 6
P 2 0 7 0 1
P 1 5 9 8 9P 2 7 6 5 8
P 2 7 4 7 9
P 2 7 8 1 8
P 0 5 9 9 7
P 2 8 4 8 1
P 2 6 6 4 5
P 2 1 7 5 7
P 2 1 7 5 8
P 4 2 8 9 0
P 3 5 2 4 6
P 3 4 3 4 0
P 0 4 9 3 7
P 2 5 5 2 4P 1 4 2 8 2
P 0 5 5 5 5
P 2 1 8 5 0P 0 7 0 0 4P 3 4 5 7 6
P 1 5 1 7 2P 2 3 9 9 9P 1 4 0 4 3
P 3 4 4 2 9P 1 8 5 4 0 P 2 3 9 8 8
P 1 3 9 0 3
P 2 1 3 4 7
Q 0 1 8 3 6
P 4 5 0 7 5
P 0 0 5 3 8
P 4 2 5 3 0
P 3 2 2 9 6
P 2 0 0 6 7
P 4 1 1 3 4P 3 2 7 7 6
P 2 4 5 8 8
P 4 7 6 7 2
P 3 5 2 0 8
P 1 0 0 8 5
P 4 7 9 2 8
P 1 0 0 3 0
P 1 1 1 6 1
Q 0 0 8 9 9P 0 3 0 0 1P 1 4 5 4 7
P 0 6 5 6 6 P 2 2 4 6 2P 4 6 1 9 8
P 0 8 3 6 5
P 3 3 2 9 9
P 4 6 4 7 1 Q 0 1 2 0 7Q 0 1 8 4 2
P 2 0 7 2 2P 4 5 9 7 3
Q 0 1 3 6 5P 3 7 6 9 3 P 0 6 5 6 5 Q 0 8 3 4 1
P 0 8 5 1 0
Q 0 5 1 5 9P 0 4 9 5 5 P 2 2 4 6 0P 2 9 3 8 7
P 2 8 1 5 9
P 1 0 3 9 4P 1 9 5 5 9
P 1 3 0 2 5
P 1 7 2 4 7
P 4 6 0 7 0P 2 5 2 4 7P 2 9 1 4 9
P 2 9 9 9 0
P 2 6 8 0 6
P 2 6 3 7 4
P 4 7 1 6 4P 3 3 7 4 9P 1 5 2 6 9
P 3 4 8 2 0
P 0 2 7 2 4
Q 0 1 9 8 1P 0 1 1 4 3 P 4 7 8 7 2
P 9 8 0 9 2P 2 3 0 2 5
P 0 1 1 4 2
P 2 3 8 1 1
P 0 1 2 8 3
P 3 5 4 9 9 P 3 2 5 6 2P 3 8 5 5 2
P 0 3 3 1 9
P 1 8 9 1 7
P 1 5 3 9 0
P 0 6 4 7 6
P 3 6 1 3 0
P 2 9 7 1 9P 0 8 1 5 1
P 3 9 8 0 6
P 3 3 7 4 8
P 1 7 0 9 7
P 4 8 7 5 6
Q 0 1 0 1 4
P 2 9 5 9 0P 1 8 4 3 1
P 2 5 3 8 9
Q 0 3 3 5 1
P 1 0 1 8 0
P 4 6 3 2 0
P 2 1 0 0 1
P 1 0 1 8 1
P 4 5 5 7 7
P 3 6 1 3 5
P 0 9 9 8 9
P 3 2 3 6 1
P 0 6 7 8 2
Q 0 2 0 1 1
P 2 4 0 6 1
Q 0 6 8 0 5
P 4 1 2 4 1
P 3 1 1 3 5
P 2 4 1 3 3
P 3 1 6 9 5
P 3 3 1 7 6
Q 0 2 7 6 3
P 1 8 5 6 0
P 1 6 1 5 7
P 3 1 3 6 0
P 4 6 8 6 7
P 4 6 8 6 4
P 4 6 6 0 8
P 3 7 9 3 8
P 3 8 8 2 5
P 1 0 2 6 7
P 3 1 2 5 8
P 4 6 6 6 7
P 4 2 5 7 1
P 2 1 0 0 0
P 3 1 3 6 4
P 4 3 6 9 9
P 1 7 2 7 8
P 2 2 0 0 9
P 0 3 2 7 4
P 3 6 1 9 7
P 2 3 2 1 9
P 2 7 7 0 4
P 4 2 0 8 6
P 3 4 1 5 2
P 3 3 5 3 0
P 2 6 7 7 9
P 2 3 3 2 7
P 4 7 9 8 7
P 9 8 0 8 3
P 4 5 7 2 3
P 4 6 3 6 0
P 4 8 0 2 5
P 2 7 8 7 0
P 2 7 4 0 0
P 0 0 5 3 0
P 4 3 4 0 3
P 4 3 4 0 4
P 0 4 2 7 8P 1 6 2 3 0
P 3 5 8 2 7P 2 0 9 9 9
Q 0 1 2 2 5P 1 5 7 9 0
P 1 5 4 9 8
P 0 8 6 3 0 P 4 2 6 8 4
P 2 7 4 4 6
P 1 3 1 3 5
Q 0 6 8 4 6
P 0 8 1 0 3
P 3 4 0 2 4
P 0 0 5 1 9
P 3 2 7 9 0
P 4 5 5 3 9
P 0 5 1 2 6
P 2 4 5 0 3
P 3 0 3 3 6
P 1 3 4 0 6
P 0 8 6 3 1
P 1 8 6 5 3
Q 1 0 0 5 6
P 2 1 8 6 0
P 4 2 6 8 6Q 0 1 4 0 6
P 1 5 0 5 4P 4 2 6 9 0P 4 2 6 8 1
P 2 1 6 9 3
P 3 9 5 4 6
P 4 6 1 0 8
Q 0 4 9 1 2
P 2 0 7 9 3
P 2 9 0 7 4
P 4 3 3 7 8
P 2 8 8 2 8
P 3 5 2 3 3
P 1 0 9 0 9
P 4 3 1 4 2
P 0 7 6 2 1
P 2 1 0 3 9
P 3 5 3 4 3
P 3 5 3 7 7
P 3 0 5 5 7
P 1 3 9 4 5
P 0 6 2 4 5
P 0 9 2 1 5
P 0 9 3 8 6
P 0 7 9 3 1
P 4 7 7 3 5
P 3 2 4 9 0
P 2 3 4 5 8
Q 1 0 0 7 1
P 3 6 3 1 4
P 1 3 2 4 4
Q 0 9 8 9 8
P 4 7 8 0 3
P 3 5 4 0 9
P 3 0 9 3 7
Q 0 2 9 4 2
Q 0 2 1 5 6 P 2 4 5 8 3
P 3 8 9 7 0
P 2 1 1 4 6 P 3 6 0 0 5
P 4 1 2 7 9Q 0 8 9 4 2
Q 0 5 9 9 9
Q 0 0 3 4 2
P 4 9 3 3 9
P 1 1 7 3 0
P 0 6 8 4 5
P 0 7 5 2 4
P 0 7 6 0 2
P 3 2 6 1 2
Q 0 9 4 9 9
P 2 3 4 4 3
P 2 4 5 7 1
P 1 8 2 9 3
P 1 8 2 6 5P 3 4 2 4 4
P 2 6 9 9 3
P 3 5 5 1 3
P 1 5 2 4 2P 0 3 6 4 3
P 0 8 4 1 4
Q 0 5 6 5 5
P 4 0 5 5 0
P 1 7 7 8 9
P 3 4 9 2 5
P 3 5 7 6 1
Q 0 6 5 4 8
Q 0 9 5 3 7
P 0 8 4 1 3
P 2 9 5 9 7
P 3 9 9 6 8
Q 0 5 5 1 3
P 2 3 3 3 9
P 0 3 6 4 1
P 0 6 6 2 5 P 3 9 0 0 0
P 4 7 8 0 9
Q 0 4 8 9 9
P 3 4 8 9 2
P 2 0 6 1 3
P 2 6 8 1 8
P 3 4 9 4 7
P 4 0 2 3 0
Q 0 1 8 8 7
P 3 4 3 6 9
P 1 3 6 7 7
P 2 7 0 3 7
Q 0 6 2 2 6
P 0 9 5 9 9
P 4 8 6 1 5
P 3 6 9 7 8
P 3 3 4 9 7
P 3 2 3 5 8
P 2 3 2 9 8
P 2 4 7 2 3
P 1 3 1 8 6
P 2 6 6 1 9
P 0 2 6 9 4
P 4 3 5 3 5
P 2 3 2 9 2
P 3 9 9 6 2
P 0 5 6 2 2
P 1 3 1 8 7
P 0 9 6 1 9
P 1 8 6 1 2
P 0 0 5 1 6
P 0 4 4 0 9
P 3 6 0 9 5P 3 7 3 0 5
P 0 4 5 8 4
P 1 1 7 9 8
P 2 7 9 6 6
P 0 6 4 8 5 P 1 7 0 5 3 P 2 2 0 7 1 P 2 1 0 9 7 P 3 7 8 9 6 P 1 3 3 7 2 Q 0 9 1 6 3 Q 0 1 4 8 1 P 4 4 4 8 3P 1 3 8 3 7 P 4 5 6 0 9
P 3 4 7 3 2 P 2 9 0 1 6P 4 2 8 8 2
Q 0 5 8 6 6
P 1 8 7 5 9 P 4 3 3 0 4
P 1 4 2 0 6
P 3 2 1 9 1 P 1 4 7 7 6P 1 6 4 2 1P 4 1 6 8 8P 3 1 7 8 3P 1 5 8 1 2
P 2 1 3 7 5P 3 6 0 3 3P 0 4 7 7 6Q 0 9 6 7 1Q 0 7 0 7 5P 4 4 6 2 4P 1 2 6 8 8P 2 0 6 0 8 P 0 4 3 4 7Q 0 9 6 7 0P 1 8 9 6 1
P 3 8 6 0 8 P 2 7 2 5 7 P 4 3 4 5 2 P 4 3 0 1 0Q 0 6 9 0 8
P 1 6 9 1 6 P 0 9 9 7 6 P 0 3 5 3 9P 4 9 5 3 0P 1 6 9 1 7 P 4 5 0 8 5P 3 7 0 0 2 P 4 6 4 9 1 P 3 0 8 5 1P 4 4 0 7 4
P 1 1 0 2 4 P 2 9 1 2 0 P 1 6 5 1 9 P 2 9 9 1 5 P 2 4 9 1 8 P 2 4 5 0 1P 1 1 0 5 2 P 1 4 6 0 5 P 3 0 5 2 8 P 4 7 2 5 2 P 3 3 8 0 3 P 2 0 9 3 8 P 2 5 1 8 9 P 4 6 5 4 8P 1 1 5 8 9P 4 0 7 9 6
P 2 2 6 4 8P 1 8 2 4 4P 2 4 6 5 2 P 4 3 3 6 0 P 4 3 3 6 6
Q 0 0 6 8 9Q 0 6 0 3 1P 0 0 9 5 9P 0 0 9 5 8Q 0 5 6 8 5P 4 1 4 3 9P 1 6 3 2 3
P 1 0 4 6 3P 1 5 5 5 3P 1 2 7 7 7P 2 3 9 6 5P 4 2 1 2 6P 0 9 1 5 2P 4 2 1 7 5P 1 0 4 1 5
P 4 0 9 1 5 Q 0 1 2 6 3 P 2 8 2 7 3 Q 0 5 8 8 0 P 2 2 2 8 4 P 2 2 2 8 5 P 0 8 3 7 3 P 0 8 1 9 9P 2 4 4 2 5 P 4 4 6 0 5
P 0 7 8 8 8 P 0 5 3 5 7 P 0 9 7 8 1 P 0 3 8 2 8 P 3 1 0 6 4 P 1 1 8 3 5
P 4 1 3 9 0P 4 2 5 0 2P 1 7 1 1 5P 4 6 2 3 6P 2 3 1 0 7P 2 6 7 1 9 P 0 4 0 4 6P 1 6 0 2 7 P 2 3 6 5 8 P 0 4 3 9 4P 3 5 6 1 3 P 1 7 7 9 0 P 4 3 3 1 5
P 4 9 5 9 8 P 2 9 3 5 3P 0 8 6 8 9
P 1 0 3 5 4 P 2 7 4 0 5 Q 0 5 8 1 5 P 2 5 2 5 0 Q 0 5 0 9 4 P 4 3 8 8 5P 1 1 6 8 0 P 1 8 2 7 8P 3 0 6 1 3 P 4 1 0 8 3 P 2 2 7 5 9 P 2 6 3 3 9P 3 5 1 1 3
P 0 2 8 0 8P 4 2 2 3 1P 0 9 9 1 6P 1 6 0 4 3P 4 7 9 0 7 P 3 1 3 9 6 P 3 5 3 9 8 P 2 1 3 2 8 P 3 6 4 3 8 P 2 8 5 8 3P 4 1 1 5 0
P 4 9 4 6 6P 1 3 3 8 3 P 2 1 5 7 7
P 1 1 5 1 5P 1 2 0 4 7 P 1 3 0 8 8 P 1 6 8 5 0 Q 0 0 7 0 9P 1 2 0 4 6P 4 1 1 5 8P 2 8 3 2 4P 3 7 0 5 1P 4 6 7 0 1P 2 6 0 1 0
P 3 9 4 0 9 P 0 4 2 7 6 P 0 9 9 5 8 P 2 3 3 7 7 P 2 7 3 0 3 P 4 4 9 2 8 P 0 6 8 3 9 P 2 0 8 2 5 P 1 4 5 5 3P 3 3 0 8 7P 3 7 2 7 4
P 3 3 9 8 5 Q 0 3 3 5 0P 4 0 5 9 1
P 0 5 4 0 7P 2 2 8 8 1 Q 0 5 8 9 5
P 0 9 5 7 0
P 2 9 6 9 6P 1 8 8 6 9P 4 8 5 7 2Q 0 0 4 9 6P 4 6 0 8 1 P 2 0 0 2 4P 3 9 5 9 9 P 0 4 9 5 8
P 3 7 0 6 2 P 4 5 8 5 6 P 3 8 5 3 2
Q 0 8 4 2 6
P 0 6 6 8 4 P 3 4 6 0 1
P 1 6 1 6 7P 3 9 0 6 1Q 0 0 6 1 3P 4 1 1 5 1P 3 7 4 5 5P 1 8 3 1 0P 0 7 8 9 6
P 1 0 2 9 0 P 2 7 8 9 8 P 4 8 8 3 9 Q 0 9 9 2 8 P 3 0 4 1 4 P 2 7 9 1 9 P 0 7 2 0 6 P 2 5 1 7 2 Q 1 0 1 3 4P 2 1 1 5 1 P 2 5 9 1 6
P 3 3 0 0 7 P 4 5 3 6 4 Q 0 7 4 3 2 P 3 8 5 3 0 P 3 0 5 3 0P 3 3 9 8 4
Q 0 6 4 4 1P 3 5 4 4 1
P 1 1 9 0 8 P 3 1 9 0 9 P 0 1 0 2 6
Q 0 9 7 8 2
P 0 3 8 0 3P 3 2 1 5 5P 2 5 3 5 3
P 0 0 3 8 2
P 4 1 9 3 1P 0 2 6 3 6P 1 3 3 9 4
P 1 4 0 4 6 P 0 1 0 2 3 Q 0 1 8 3 3 Q 0 7 9 4 6 P 3 0 3 4 1
P 0 6 4 0 7
P 4 3 8 2 5
P 1 4 8 6 8
P 3 6 4 1 9
P 3 8 7 0 7
P 3 5 1 9 1
P 4 3 8 2 9
P 1 2 0 2 3
P 2 6 8 0 8 P 4 7 9 3 2
Q 0 0 6 5 3 P 4 7 6 3 2
P 8 0 3 1 3
P 0 4 8 0 8
Q 0 4 5 5 4
P 1 4 2 8 3P 1 2 3 4 9
P 3 7 4 6 4 P 4 0 2 7 5 P 1 2 3 4 8
P 4 5 3 8 0
P 1 3 0 1 5
P 1 2 1 5 5
P 3 1 5 2 2P 4 0 8 7 9
P 1 9 3 7 5
P 0 0 4 9 9
P 0 1 2 1 1
P 4 3 8 5 3
P 0 5 3 7 4
P 2 8 4 8 0
P 0 0 8 9 4
P 4 8 1 1 3
Q 0 2 1 4 0 P 2 5 6 0 5 P 0 7 0 7 5 P 3 2 6 7 5 P 1 5 7 1 0P 1 3 1 3 4P 0 9 2 3 1P 4 0 6 0 4Q 9 9 2 8 9 Q 0 9 1 7 5
P 0 7 3 0 5
P 4 6 5 8 6
P 0 9 9 3 3
P 2 8 7 1 5
P 1 9 2 1 4
P 3 5 6 8 9
Q 0 2 2 5 6 P 0 1 2 1 4 P 2 2 0 0 5P 3 7 3 2 9
P 0 0 4 3 6P 2 8 2 9 8 P 3 7 1 3 6
Q 0 1 5 8 1
P 0 9 4 7 0
P 3 7 7 3 4
P 1 5 1 0 9P 4 6 8 3 1
P 0 6 7 9 8Q 0 0 0 5 6P 2 3 2 2 8
P 1 0 5 4 9P 1 4 2 2 6P 3 7 2 3 1Q 0 3 1 8 1P 3 4 7 5 4P 0 8 9 3 4
P 4 9 1 1 2
P 2 6 3 1 1
P 3 6 1 9 2 P 1 6 4 6 6
P 0 8 4 8 7
P 0 1 0 4 8
P 4 5 1 7 2
P 1 5 0 0 1
P 4 9 2 9 3
P 1 0 6 8 6 P 1 2 2 7 3
P 1 5 3 2 0 P 0 7 8 3 8
P 3 3 7 7 5 P 4 7 1 9 0
P 2 0 5 8 5 P 1 3 7 0 5
P 4 6 5 4 2
P 2 5 0 9 0
P 2 7 0 2 8 P 0 7 2 1 0
Q 0 0 6 8 0
P 2 7 2 7 6
P 0 8 3 1 8
P 3 3 6 9 6P 2 4 0 8 1
P 4 2 9 0 6P 4 4 8 4 6
P 1 5 1 5 1
P 3 9 7 6 8P 1 0 1 7 0
P 4 0 3 0 7 P 2 2 1 4 1P 4 8 1 4 7
P 3 1 5 5 4P 0 9 1 3 1
P 3 2 5 0 6
P 0 9 9 1 8P 4 6 0 6 7P 1 2 7 4 5P 1 2 7 4 7P 3 5 9 8 6Q 0 8 4 7 0Q 0 1 0 1 5 P 0 8 1 1 1 P 1 4 2 6 9 P 1 5 2 9 2
P 0 0 8 4 8Q 0 8 8 9 0P 1 6 0 9 2 P 2 1 8 0 3P 1 2 1 5 9 P 2 2 3 0 4
P 3 1 7 0 8 P 4 4 3 3 0 P 1 0 5 0 3P 2 3 3 8 6P 1 5 7 5 0
P 0 4 1 4 4 P 1 8 2 5 4 P 0 1 3 3 8 P 1 6 1 0 4 P 4 0 2 7 9 P 0 1 2 2 9P 4 7 6 4 5 P 4 5 6 5 7Q 0 2 9 1 7
P 0 9 1 8 1P 3 9 5 1 8P 2 7 8 6 4P 0 9 0 3 0P 1 2 0 4 2P 3 5 8 5 2P 4 4 9 4 7P 0 4 2 2 0P 0 1 8 6 0 P 2 6 9 8 2P 3 6 3 5 4 P 8 0 2 9 9
P 2 6 0 0 7 P 1 5 7 2 2P 2 3 2 2 9P 2 6 3 6 7 P 4 7 2 3 8 P 2 2 5 5 7 P 0 8 6 8 0 P 3 2 5 9 7 Q 0 0 2 6 9 P 1 4 3 8 1 P 2 9 1 3 0P 9 8 0 8 5 P 0 7 5 6 7P 4 9 1 0 8P 2 7 2 0 2P 2 6 6 7 8P 2 6 6 7 7 P 2 9 1 7 5P 0 9 8 8 9
P 4 2 7 8 9 P 1 8 3 9 5 P 2 4 3 0 4 P 1 6 5 4 9 P 3 4 9 1 3Q 0 0 9 6 6P 1 2 6 2 3
P 3 7 2 7 1Q 0 3 0 4 6P 2 0 6 9 8P 2 0 6 9 6P 4 2 1 7 9P 3 0 9 5 8P 3 7 4 7 4P 3 6 6 4 9 P 4 5 5 8 5 P 1 8 5 4 8P 0 0 2 5 9
P 4 2 6 7 5 P 4 2 6 7 6 P 0 6 8 1 1 Q 0 1 6 7 9 P 4 3 7 7 5 Q 0 4 6 0 9 P 0 7 6 5 4 P 4 5 1 9 0 P 0 6 6 7 0
P 4 0 8 5 1P 1 3 0 8 9P 0 1 0 9 3P 4 9 2 5 3P 0 8 8 1 9P 4 6 4 8 3P 1 1 4 7 2P 0 7 1 1 7
Q 0 1 9 6 9 P 3 5 4 4 6 P 3 5 4 4 7 P 2 3 5 4 9 P 0 9 7 5 8P 1 6 4 2 2 P 1 0 4 8 1P 0 7 9 8 4
P 3 9 7 6 6P 2 5 1 0 5P 1 7 9 5 5P 3 7 1 9 8 P 4 1 0 0 6P 2 3 5 9 1 P 3 3 2 1 7 P 2 1 5 5 6P 0 7 2 2 3P 1 5 4 0 7 P 4 3 2 1 9 P 0 9 6 8 1 P 4 9 6 9 8 P 2 2 4 4 9 P 4 4 6 0 2 P 4 5 3 5 4 P 3 0 3 0 9
Q 0 5 1 4 6P 1 4 6 3 5P 2 4 8 6 0P 2 0 1 0 3P 3 0 1 9 5P 4 5 0 0 3P 1 8 7 7 6P 4 3 5 5 0
P 3 0 6 7 2 P 2 9 1 7 6P 4 2 5 5 7 P 4 1 4 4 0P 1 9 0 9 7P 1 3 2 1 5 P 4 2 6 9 7P 0 6 6 1 5 P 1 5 3 6 8P 2 0 5 9 1
P 3 6 5 8 1P 3 5 5 6 5P 3 8 7 5 5Q 0 2 2 0 1P 0 9 5 4 5P 2 8 0 3 1Q 0 9 7 6 8 P 1 3 2 8 8P 4 8 5 0 6P 1 5 6 9 8P 3 4 4 2 5 P 4 3 4 4 9 P 3 3 2 4 0 P 3 9 3 9 6 P 4 5 5 1 0
P 2 9 1 5 5
P 0 8 9 7 0
P 4 6 8 1 3
P 2 9 2 1 8
Q 0 8 8 7 5
P 4 2 1 7 6
P 3 5 2 3 2
Q 0 2 2 0 7P 2 5 4 1 6P 3 9 8 6 8
P 2 6 4 9 5
P 0 8 1 5 9
P 3 9 2 7 6
P 3 6 5 7 4
P 2 7 9 8 1
Q 0 5 5 2 8P 4 9 1 0 2
P 4 8 0 4 4P 1 1 7 0 1
P 4 9 3 3 1
P 3 0 5 1 8P 2 9 3 3 6
P 3 7 9 8 6
P 1 9 3 1 8
P 3 9 1 4 8 P 3 5 4 2 5P 3 8 0 2 5P 1 5 6 4 4P 1 5 3 0 9Q 1 0 1 1 3Q 0 3 4 6 0P 0 3 7 1 0P 4 8 8 4 8P 0 3 2 8 4
P 1 2 6 8 0 P 3 8 9 3 8
P 3 2 0 8 6 P 4 8 8 2 8
P 4 4 7 9 5 P 3 4 8 9 5
P 3 5 1 6 5 P 4 8 8 9 2P 1 2 3 5 2 P 4 2 7 1 2
P 3 1 4 6 0 P 0 8 3 2 5 P 0 9 8 3 2 P 3 6 9 2 4 P 3 7 2 3 2 P 2 8 2 3 5 P 1 5 6 8 7P 3 4 0 5 5 P 1 3 9 1 1P 3 2 4 8 5
P 1 1 6 2 1P 2 7 3 3 6P 3 4 7 5 0P 2 0 6 4 6P 2 9 0 9 3P 0 3 5 1 9 P 3 7 1 2 7 Q 0 2 2 5 1 P 2 3 1 7 6
P 1 1 0 6 6
P 3 6 2 6 0
P 1 1 3 4 9
P 1 7 5 1 8
P 0 7 9 8 5P 0 7 7 3 8P 3 6 2 6 5
P 2 4 1 3 1
P 1 6 0 9 7
P 3 9 7 7 3P 3 6 2 6 6
P 4 3 9 2 0
P 0 0 7 2 2
P 0 0 3 0 9P 3 0 1 8 3 P 4 4 8 3 6
P 2 7 9 6 7
P 3 2 7 4 7
P 4 9 0 8 6 P 1 4 8 5 3
P 1 8 6 4 2 P 4 4 7 1 5P 2 4 2 2 0
P 1 3 6 4 9P 3 7 8 8 7
P 3 2 2 3 2P 0 9 4 8 9 P 2 7 5 2 6
P 3 2 1 5 4 P 0 6 4 9 0
P 0 5 8 7 6 P 3 2 6 7 2
P 3 5 5 2 0 P 1 0 2 1 2 P 2 8 7 7 2P 0 7 3 7 5
P 2 6 3 8 2
P 0 3 3 6 3P 2 5 9 7 1
P 4 3 8 3 3P 0 6 1 7 9
P 1 0 6 1 4P 1 2 8 7 0
P 0 9 8 9 1 P 2 8 3 4 8
P 0 3 5 7 9
P 2 3 7 7 6P 0 6 9 6 0
Q 0 0 5 5 6
P 1 0 7 6 8P 1 7 5 6 1 P 0 3 5 4 4P 1 5 1 8 3
P 1 0 6 1 5P 1 4 2 6 3
P 3 5 5 3 8 Q 0 3 8 4 5 P 2 0 8 1 0 P 2 7 3 2 1 P 0 4 3 2 4
Q 0 7 3 0 7P 3 6 7 8 8P 2 5 0 6 6
P 3 9 0 0 7P 4 6 9 7 5
P 2 5 5 1 5P 3 8 9 7 2P 4 6 4 5 6
P 1 9 2 1 7
P 4 9 2 3 7
P 1 6 2 8 4
Q 0 3 4 6 7 P 4 4 9 2 0
P 3 2 8 4 2 P 3 7 3 7 7
P 1 4 6 1 4P 0 5 8 5 7
P 4 8 7 7 7
P 3 3 6 1 3P 3 6 6 4 2
P 4 5 6 7 7P 4 5 6 7 8
P 3 7 3 7 9
P 3 4 5 5 8
P 2 4 1 2 8
P 3 7 3 9 8
Q 0 8 4 8 1
P 2 1 0 3 2
P 4 8 6 1 2
P 1 9 4 1 0 P 1 6 0 9 9
P 4 9 6 0 8
P 0 3 3 6 2
Q 0 3 6 1 0
P 1 4 0 7 8
P 3 7 0 3 2
P 1 1 9 7 6
P 0 2 3 8 2
P 0 0 5 4 9
Q 0 8 6 8 4
P 2 1 5 3 0
P 1 5 5 0 9
Q 0 8 0 9 9
P 3 3 8 7 9
Q 0 1 0 8 5P 3 1 4 8 3
P 4 3 0 0 2
P 1 9 7 1 1
P 1 1 0 9 5
P 0 7 5 4 7P 0 2 4 8 2P 3 3 7 5 2P 4 0 9 5 4P 2 5 7 6 5
P 3 8 9 7 1P 3 4 6 5 0
P 3 8 0 9 2
Q 0 3 0 6 5 P 2 7 6 9 3
Q 0 6 7 5 8P 2 5 4 1 5P 3 2 2 4 2P 8 0 2 0 5P 3 9 5 2 9Q 0 9 9 2 3P 2 9 0 2 9
Q 0 0 9 9 3 P 2 0 5 3 3 P 4 0 9 0 8 P 4 0 4 6 7 P 3 2 2 4 3 P 8 0 2 0 6 P 1 1 6 3 5P 4 3 8 7 9
P 4 5 0 3 5P 2 9 4 6 5P 3 0 5 9 4P 0 5 8 2 5 P 3 0 5 7 2 P 2 9 9 6 1P 3 0 5 9 7
Q 0 1 6 5 7
P 1 6 4 5 1P 2 7 7 4 7
P 1 6 2 6 3
P 1 3 5 1 6
Q 0 1 2 0 5
P 4 5 1 1 8
P 1 6 5 2 1
P 0 6 9 5 9
P 2 5 9 9 7
P 1 2 6 9 5
P 0 3 9 5 6
Q 0 2 9 7 5
P 3 0 2 9 6
P 4 5 1 7 0
Q 0 9 4 2 7
Q 0 4 9 8 2
P 2 1 8 5 2
P 3 8 0 4 6
P 4 0 0 2 4
P 4 5 1 0 5P 4 7 3 0 3
P 4 5 6 0 0
P 3 1 0 6 0
P 4 2 4 3 6
P 4 5 3 2 1
P 0 9 8 3 3
P 1 0 6 3 6
P 4 5 0 5 2
P 4 5 0 5 1P 4 1 2 3 3
P 4 1 6 4 7
P 0 8 2 6 6
P 2 4 1 3 6
P 0 8 0 0 7
P 3 9 1 0 9
P 2 1 4 3 9
Q 0 3 5 1 9
Q 0 0 6 1 9
P 3 6 3 7 1
P 3 8 7 3 5
P 3 3 3 1 0
P 1 9 7 7 1
P 4 5 8 6 1
P 2 1 4 4 8 P 3 3 3 1 1
P 4 9 5 0 1
P 2 1 4 4 1P 2 2 0 3 6
P 1 3 5 6 8
P 4 5 1 6 7
P 3 7 6 2 4
P 2 4 6 8 3
P 1 8 7 6 6
P 2 4 1 3 7
P 4 5 1 7 1
P 3 3 9 4 1
P 4 3 0 7 4
P 3 3 2 0 0
P 4 2 3 3 7
P 0 9 0 1 2
P 1 2 3 8 3
P 3 3 3 0 2
P 3 0 9 6 3
P 4 6 9 2 0P 1 5 1 8 7
P 2 6 3 6 1
P 1 6 6 8 4
P 0 3 5 9 3
Q 0 3 0 2 5 P 3 6 0 2 8
P 4 0 9 6 7
Q 0 2 5 9 2
P 4 5 7 9 1P 1 1 0 9 2
P 3 6 3 3 0
P 0 3 5 5 6
P 0 5 8 4 4
P 3 4 9 5 6
P 2 4 7 9 4P 0 6 0 1 9
P 2 1 4 8 0
P 0 3 8 7 8
P 0 0 3 9 7
Q 0 9 8 9 3
P 1 9 0 2 8P 2 2 0 5 6
P 2 2 4 9 5P 3 6 3 3 1 P 1 9 1 9 9
P 1 7 7 5 7 P 2 7 4 1 0
P 3 5 9 2 8
P 0 9 8 1 4
P 0 8 3 6 4
P 1 9 5 6 1
Q 0 5 0 5 7P 1 8 2 4 7
P 3 1 6 3 0
P 1 3 8 9 7P 1 3 9 0 0
P 0 6 9 3 5
P 2 5 0 5 9
Q 0 2 5 9 7
P 1 0 9 7 8
P 1 6 6 0 4
P 0 3 2 0 0
P 1 7 5 9 3Q 0 4 5 4 4
P 0 3 3 0 5P 0 3 3 0 6
P 2 9 3 2 4P 3 6 3 2 7
P 1 6 6 9 1P 3 6 6 3 8
P 2 4 5 8 6P 0 3 3 1 6P 2 7 2 8 5
P 4 4 0 4 7
P 2 9 1 7 2
P 1 0 3 0 6
P 4 4 9 1 7
P 0 3 3 1 4
P 2 0 1 2 6
P 0 5 9 5 9Q 0 4 5 3 8
P 0 3 5 9 9
P 1 3 5 6 1
Q 0 0 9 6 2
P 3 6 3 0 4
P 3 6 3 0 9
P 2 7 2 8 2
P 2 7 4 0 9 Q 0 4 6 1 0
P 0 3 3 0 4
P 1 9 9 0 1
P 1 3 5 2 9
P 0 8 7 6 8
P 1 1 2 0 4
P 1 0 2 7 2
P 1 9 5 6 0
P 0 3 3 0 2P 3 1 8 2 2
P 2 2 3 2 1P 1 7 1 2 4
P 2 1 9 1 7
P 0 8 1 7 2
P 0 8 9 1 2
P 3 2 2 1 1
P 3 5 3 7 2
P 1 8 8 2 5
P 2 5 4 7 3
P 2 1 0 8 4
P 4 7 8 9 8
P 0 8 1 7 3
P 3 3 5 3 3
Q 0 4 5 7 3
P 4 7 7 4 8
P 3 0 0 9 8
P 4 1 1 4 3
P 4 7 7 5 1P 4 6 0 9 0
P 3 4 9 7 5
P 3 5 3 7 1
P 4 1 1 4 4
P 3 5 3 5 0 P 3 0 5 4 9
P 0 5 3 6 3
P 1 4 1 2 6
P 4 9 5 7 8
P 1 1 2 2 9
P 2 0 3 0 9
P 4 2 2 8 9
P 2 5 9 6 2
P 0 4 2 7 4
P 4 7 9 0 1
P 0 6 7 2 4
P 4 8 9 7 4
P 1 0 7 2 0
P 3 8 8 6 7
P 2 2 3 3 2
P 4 2 3 4 7
P 4 3 1 1 5
P 3 4 9 8 0
P 3 5 4 0 8
P 2 8 3 3 6
P 3 2 5 1 2P 4 6 7 3 7
P 0 4 0 0 1
P 3 2 2 4 0
Q 0 5 3 9 4
P 3 0 8 7 4P 2 8 6 4 6
P 3 0 8 7 2
P 3 0 6 8 0
P 3 1 3 9 1
P 3 2 7 4 5
P 0 1 4 5 2
P 3 2 2 3 6
P 3 1 3 8 9
P 2 1 4 5 0P 3 0 5 4 6
P 3 2 9 4 0
P 2 0 3 4 6P 2 3 3 6 2
P 2 8 0 8 8
P 4 8 7 4 8
P 2 3 1 6 3
P 2 8 6 8 0
P 3 5 3 8 3
P 2 5 1 1 5
P 2 9 2 7 6
P 1 8 9 0 1
P 3 0 9 3 8
P 3 0 9 3 6
P 2 5 9 3 0
P 2 8 8 2 7
P 2 2 7 3 5
P 4 1 3 8 1 P 3 4 6 8 9
P 3 9 6 8 7
P 4 2 3 0 5
P 2 2 2 9 7
Q 0 1 7 1 7
P 1 9 3 9 8
P 2 0 6 3 8
P 3 6 1 7 6
P 3 5 4 0 7
P 4 3 5 0 5
P 4 6 0 2 3
P 2 1 4 6 3P 1 6 4 7 3
P 4 7 7 9 9
P 3 7 9 7 2
P 3 2 3 1 1
P 3 2 4 8 2
P 1 5 4 0 9P 2 4 9 3 9P 0 5 0 7 8
P 2 5 1 0 6
P 3 2 3 0 6
P 1 1 6 1 3
P 4 3 2 5 3
P 3 0 9 8 9
P 4 3 6 5 7
Figure 5.10. The Protein network using only circular patterns
Red nodes are nodes found in the non-CP network. Green nodes are nodes found only in theCP-network. Blue edges denote edges found in the CP network. Pink edges are edges foundonly in the CP network.
CHAPTER 5. CIRCULAR PATTERN MATCHING 130
Pfam-B_11512
Pfam-B_4788Pfam-B_4842
Pfam-B_6865
Pfam-B_4911
Pfam-B_11613Pfam-B_6813
Pfam-B_4947Pfam-B_4948Pfam-B_4987
Pfam-B_1702
Pfam-B_4979
Pfam-B_11850
Pfam-B_1969
Pfam-B_2921Pfam-B_11745
Pfam-B_277
Pfam-B_2673
Pfam-B_4454
Pfam-B_4209
Pfam-B_2910
Pfam-B_2973
Pfam-B_5651
Pfam-B_1461
Pfam-B_9091
Pfam-B_692
Pfam-B_2674
Pfam-B_3129
Pfam-B_3239
Pfam-B_3238
Pfam-B_394 Pfam-B_5119
Pfam-B_2029Pfam-B_504
Pfam-B_2258
Pfam-B_1094
Pfam-B_2866 Pfam-B_3552
Pfam-B_5821
Pfam-B_433
Pfam-B_3539
Pfam-B_658Pfam-B_1786Pfam-B_1836
Pfam-B_4973Pfam-B_10609
Pfam-B_2406
Pfam-B_10602
Pfam-B_10382Pfam-B_6436
Pfam-B_10379
Pfam-B_2650Pfam-B_1632
Pfam-B_2627
Pfam-B_876 Pfam-B_11499
Pfam-B_11495
Pfam-B_269
Pfam-B_286
Pfam-B_2335Pfam-B_8803 Pfam-B_689Pfam-B_1923Pfam-B_320Pfam-B_2224
Pfam-B_6011 Pfam-B_5950
Pfam-B_2300
Pfam-B_5840
Pfam-B_227
Pfam-B_5743Pfam-B_5806
Pfam-B_2251Pfam-B_9150Pfam-B_228
Pfam-B_5302Pfam-B_5608Pfam-B_5696
Pfam-B_1753Pfam-B_2184Pfam-B_3218
Pfam-B_2475
Pfam-B_5093 Pfam-B_4988
Pfam-B_10424 Pfam-B_8133Pfam-B_1992Pfam-B_5438
Pfam-B_5213
Pfam-B_2049
Pfam-B_11854Pfam-B_11873Pfam-B_11903Pfam-B_1193Pfam-B_11306Pfam-B_1262Pfam-B_6160Pfam-B_171Pfam-B_5566Pfam-B_1741 Pfam-B_11880
Pfam-B_2865
Pfam-B_7699Pfam-B_5202
Pfam-B_6249Pfam-B_6848
Pfam-B_2958
Pfam-B_6977
Pfam-B_1263
thy rog lobu l i n_1 P fam-B_8589 Pfam-B_7924Pfam-B_11458
Pfam-B_4004 Pfam-B_11457
Pfam-B_5Pfam-B_5307Pfam-B_9386Pfam-B_783
Pfam-B_1054Pfam-B_3588Pfam-B_3757Pfam-B_3799
Pfam-B_6870
Pfam-B_7225Pfam-B_3831Pfam-B_3832Pfam-B_3859Pfam-B_7439Pfam-B_4175Pfam-B_3167Pfam-B_4080
Pfam-B_6867
r e c A Pfam-B_9714
Pfam-B_673Pfam-B_736 Pfam-B_2401
Pfam-B_5273Pfam-B_2571
Pfam-B_10316 Pfam-B_9292
Pfam-B_4120Pfam-B_4119Pfam-B_4892 Pfam-B_490
Pfam-B_797Pfam-B_662Pfam-B_5906
Pfam-B_1396
Pfam-B_721
Pfam-B_11421
Pfam-B_7405
Pfam-B_2056 Pfam-B_3365 g l n - s y n t
P fam-B_3796
Pfam-B_3920
Pfam-B_5122
Pfam-B_572 Pfam-B_2615Pfam-B_5457 Pfam-B_7633Pfam-B_5346 Pfam-B_5218 Pfam-B_5096Pfam-B_5095
Pfam-B_8028Pfam-B_8352
Pfam-B_1566
Pfam-B_10902
Pfam-B_2256
Pfam-B_5880
Pfam-B_11520
Pfam-B_6278
Pfam-B_7187 Pfam-B_685Pfam-B_6869 Pfam-B_6864 Pfam-B_6835
Pfam-B_304Pfam-B_2828Pfam-B_3681Pfam-B_11589Pfam-B_3745Pfam-B_7510Pfam-B_4309Pfam-B_4774 Pfam-B_6596Pfam-B_2943Pfam-B_4676 Pfam-B_6719Pfam-B_2883
Pfam-B_6618Pfam-B_6626
Pfam-B_9676Pfam-B_2935
Pfam-B_6268
Pfam-B_2312Pfam-B_2534Pfam-B_7519Pfam-B_2864Pfam-B_6444Pfam-B_2930Pfam-B_2985Pfam-B_11891 Pfam-B_2447Pfam-B_2929 Pfam-B_962Pfam-B_489 Pfam-B_4599 Pfam-B_3242Pfam-B_11177Pfam-B_4052 Pfam-B_3364Pfam-B_10317 Pfam-B_3921Pfam-B_4081Pfam-B_4893 Pfam-B_4144
Pfam-B_8112 Pfam-B_8111 Pfam-B_7515Pfam-B_7589
Pfam-B_6070Pfam-B_2391Pfam-B_2044Pfam-B_2577Pfam-B_2593Pfam-B_258Pfam-B_3806Pfam-B_2595
Pfam-B_6236Pfam-B_6644Pfam-B_6700Pfam-B_6725Pfam-B_6747Pfam-B_6757Pfam-B_6758Pfam-B_6795
Pfam-B_536Pfam-B_5836 Pfam-B_5124 Pfam-B_5992
Pfam-B_1450 Pfam-B_11368 Pfam-B_11477
Pfam-B_4670
Pfam-B_356 Pfam-B_9210
Pfam-B_9467
Pfam-B_2975 Pfam-B_161
Pfam-B_6739
Pfam-B_6743
Pfam-B_5970 Pfam-B_11746
Pfam-B_4777
Pfam-B_6918
Pfam-B_11747 Pfam-B_2046
Pfam-B_6510Pfam-B_2394
Pfam-B_2879
Pfam-B_11478
Pfam-B_3562
Pfam-B_2396Pfam-B_11476
Pfam-B_4720Pfam-B_513
Pfam-B_602
Pfam-B_8703Pfam-B_2896Pfam-B_2894
Pfam-B_921
Pfam-B_8702
Pfam-B_9212
Pfam-B_7773 Pfam-B_557
Pfam-B_1812
Pfam-B_683
Pfam-B_207
Pfam-B_3350
Pfam-B_1312
Pfam-B_4719
Pfam-B_1231Pfam-B_5220
Pfam-B_2053
Pfam-B_4800
Pfam-B_2123
Pfam-B_2134
Pfam-B_5222Pfam-B_3900
Pfam-B_8761
Pfam-B_5221
Pfam-B_1330 Pfam-B_8645 Pfam-B_5074
Pfam-B_5618
Pfam-B_8252
Pfam-B_1150
Pfam-B_4708
Pfam-B_8253 Pfam-B_9521
Pfam-B_5619
Pfam-B_2209
Pfam-B_8030Pfam-B_7857Pfam-B_7457Pfam-B_6310Pfam-B_6280 Pfam-B_498Pfam-B_4845Pfam-B_4836
Pfam-B_8973
Pfam-B_8969heme_1
Pfam-B_5239
Pfam-B_7775
ox ido red_mo lyb
P fam-B_3078Pfam-B_7598 Pfam-B_3349Pfam-B_3607
Pfam-B_7832
Pfam-B_7597Pfam-B_7596
Pfam-B_5783
Pfam-B_7176
Pfam-B_246
Pfam-B_11893
Pfam-B_5782
Pfam-B_556
Pfam-B_7478
Pfam-B_995
Pfam-B_96
Pfam-B_4733
Pfam-B_2645
Pfam-B_8452
Pfam-B_8453
Pfam-B_3573
Pfam-B_8451
Pfam-B_7145
Pfam-B_2613
Pfam-B_169
Pfam-B_1017
Pfam-B_10739
Pfam-B_6632
Pfam-B_1842
Pfam-B_11676
Pfam-B_6873
Pfam-B_7581
Pfam-B_1027
Pfam-B_6874
Pfam-B_4858
Pfam-B_6289
Pfam-B_6209Pfam-B_6202
Pfam-B_9207
Pfam-B_61Pfam-B_7731 Pfam-B_5738
Pfam-B_7094Pfam-B_7502Pfam-B_450
Pfam-B_10311Pfam-B_1814
Pfam-B_3625
Pfam-B_3046
Pfam-B_4857
Pfam-B_6095
Pfam-B_3445
Pfam-B_635
Pfam-B_8065Pfam-B_950 Pfam-B_8851
Pfam-B_951
Pfam-B_1904
Pfam-B_2432
Pfam-B_1882
Pfam-B_949
Pfam-B_780
Pfam-B_7146
Pfam-B_3614
Pfam-B_1337Pfam-B_2268
Pfam-B_9304
Pfam-B_538
Pfam-B_1320
Pfam-B_2266
Pfam-B_493
Pfam-B_2018
Pfam-B_7468
Pfam-B_3581
Pfam-B_5107
Pfam-B_4677
Pfam-B_10706
Pfam-B_1413
Pfam-B_8753
Pfam-B_1153
Pfam-B_7512
Pfam-B_124
Pfam-B_6138
Pfam-B_694
Pfam-B_3457
Pfam-B_9833
Pfam-B_9832
Pfam-B_9830
Pfam-B_4113
Pfam-B_7410
Pfam-B_9831
Pfam-B_1412
Pfam-B_2301
Pfam-B_10502
Pfam-B_10231
Pfam-B_10228
Pfam-B_10225
Pfam-B_3894
Pfam-B_4575
Pfam-B_1243
Pfam-B_371
Pfam-B_3503
Pfam-B_7185
Pfam-B_6118
Pfam-B_8894
Pfam-B_3753
Pfam-B_3231
Pfam-B_3389
Pfam-B_7409
Pfam-B_2644
Pfam-B_2205
Pfam-B_2663
Pfam-B_94
Pfam-B_8740
Pfam-B_8741
Pfam-B_1486
Pfam-B_9387
Pfam-B_10647
Pfam-B_5147
Pfam-B_7317
Pfam-B_5134 Pfam-B_5099
Pfam-B_11865
Pfam-B_3766
Pfam-B_2092
Pfam-B_3388
Pfam-B_11866
Pfam-B_202
Pfam-B_393
Pfam-B_561
Pfam-B_2959
Pfam-B_254
Pfam-B_712
Pfam-B_2456
Pfam-B_651
Pfam-B_11600
Pfam-B_240
Pfam-B_195
Pfam-B_321
Pfam-B_3207
Pfam-B_1503
Pfam-B_183Pfam-B_1854
Pfam-B_3662
Pfam-B_8754
Pfam-B_3287
Pfam-B_4362
Pfam-B_3661Pfam-B_5639Pfam-B_4825Pfam-B_1777
Pfam-B_2545
Pfam-B_7521
Pfam-B_5538Pfam-B_5539
Pfam-B_5257
Pfam-B_3244
Pfam-B_7818
Pfam-B_10375
Pfam-B_7352
Pfam-B_5537 Pfam-B_45
Pfam-B_589
Pfam-B_2543
Pfam-B_2544
Pfam-B_1341
Pfam-B_6115Pfam-B_9790Pfam-B_4123
Pfam-B_2186Pfam-B_4730
Pfam-B_10893
Pfam-B_1334Pfam-B_1034Pfam-B_10892Pfam-B_2462 Pfam-B_1098Pfam-B_1423
p h o s l i p
Pfam-B_7503
Pfam-B_2533
Pfam-B_1122Pfam-B_2990
Pfam-B_6427Pfam-B_3793
Pfam-B_7645
Pfam-B_3576
Pfam-B_9518
Pfam-B_2411Pfam-B_4124
Pfam-B_4679
Pfam-B_2048Pfam-B_3110
Pfam-B_7782Pfam-B_11726
Pfam-B_3699Pfam-B_2448
Pfam-B_2449
Pfam-B_4030
Pfam-B_7142
Pfam-B_1778
Pfam-B_6837
Pfam-B_4826 Pfam-B_745
Pfam-B_4601Pfam-B_5640Pfam-B_7177
Pfam-B_1273
Pfam-B_2938
Pfam-B_11399
Pfam-B_3440
Pfam-B_2320
Pfam-B_2321
Pfam-B_3076Pfam-B_3416 Pfam-B_8604Pfam-B_1363
Pfam-B_10659
Pfam-B_3858Pfam-B_8653
Pfam-B_2523
Pfam-B_2770
Pfam-B_3844
Pfam-B_10660
Pfam-B_5071
Pfam-B_3847
Pfam-B_3266Pfam-B_10658
Pfam-B_1040
Pfam-B_3079Pfam-B_4320
Pfam-B_5125
Pfam-B_6477Pfam-B_1805Pfam-B_3347
Pfam-B_6071Pfam-B_884Pfam-B_7112Pfam-B_8564 Pfam-B_7858 Pfam-B_655Pfam-B_239 Pfam-B_8937Pfam-B_420 Pfam-B_11614 Pfam-B_3260Pfam-B_7454
Pfam-B_4352Pfam-B_1588
Pfam-B_4366Pfam-B_3918
Pfam-B_610
Pfam-B_6252
Pfam-B_7051
Pfam-B_5881
Pfam-B_897
Pfam-B_867
Pfam-B_8473
Pfam-B_8844
Pfam-B_8845
Pfam-B_7206
Pfam-B_1951
Pfam-B_5551
Pfam-B_4481
Pfam-B_9147
Pfam-B_661
Pfam-B_9257
Pfam-B_9258
Pfam-B_5882
Pfam-B_960
Pfam-B_7657
Pfam-B_7060
Pfam-B_5736
Pfam-B_898
Pfam-B_807
Pfam-B_470Pfam-B_221
Pfam-B_5049
Pfam-B_1191
fe r4_N i fH
Pfam-B_10037
Pfam-B_2806
Pfam-B_570 Pfam-B_9986
Pfam-B_9410
Pfam-B_1055
Pfam-B_262
Pfam-B_6353
Pfam-B_2375
Pfam-B_2374
Pfam-B_7366
Pfam-B_7362Pfam-B_3839
Pfam-B_976 Pfam-B_6352
Pfam-B_10816
Pfam-B_2431
Pfam-B_1744
Pfam-B_2429
Pfam-B_3613
Pfam-B_922
Pfam-B_10820
Pfam-B_2428
Pfam-B_5847
Pfam-B_5846
Pfam-B_2427a c t i n
P fam-B_10644
Pfam-B_107
Pfam-B_9344
Pfam-B_6403
Pfam-B_414
Pfam-B_445
Pfam-B_7363
Pfam-B_1794Pfam-B_7365
Pfam-B_9880
Pfam-B_9873
Pfam-B_7258
Pfam-B_6355
Pfam-B_1507
Pfam-B_67
Pfam-B_6354
Pfam-B_3512
Pfam-B_2974Pfam-B_695
Pfam-B_1190
Pfam-B_2076Pfam-B_8747
Pfam-B_8748
Pfam-B_2217
Cys -p ro tease
Pfam-B_111Pfam-B_2738
Pfam-B_3510
Pfam-B_1982
Pfam-B_8746
Pfam-B_3511
Pfam-B_911
Pfam-B_6605
Pfam-B_8214 Pfam-B_6952
Pfam-B_779
Pfam-B_10821
t h i o l a s e
Pfam-B_2430
Pfam-B_10530
Pfam-B_2137
Pfam-B_956
Pfam-B_5684
Pfam-B_1044
Pfam-B_345
Pfam-B_1131
Pfam-B_238
Pfam-B_2138
Pfam-B_10825
Pfam-B_10823
Pfam-B_1745
Pfam-B_846
Pfam-B_5980Pfam-B_9470
Pfam-B_8888
Pfam-B_4438
Pfam-B_8885Pfam-B_5155
Pfam-B_5156Pfam-B_1259
Pfam-B_9252
Pfam-B_4332
Pfam-B_1666
Pfam-B_9251
Pfam-B_1850
Pfam-B_1981
Pfam-B_2629
Pfam-B_627
Pfam-B_4331
Pfam-B_1250Pfam-B_7419Pfam-B_7673 Pfam-B_7441
Pfam-B_1205 Pfam-B_11779
Pfam-B_2551 Pfam-B_3861
Pfam-B_11663Pfam-B_1548 Pfam-B_11623
Pfam-B_5978 FGF Pfam-B_1808
Pfam-B_6349
Pfam-B_8813
Pfam-B_7243Pfam-B_5830
Pfam-B_7672
Pfam-B_11694 Pfam-B_11302Pfam-B_11601Pfam-B_11518 Pfam-B_11232
Pfam-B_7327 Pfam-B_7257 Pfam-B_7232 Pfam-B_6921 Pfam-B_579Pfam-B_1580Pfam-B_8605 Pfam-B_5055Pfam-B_3792
Pfam-B_7356Pfam-B_1582 Pfam-B_9574 Pfam-B_9083Pfam-B_9082 Pfam-B_5774 Pfam-B_8786Pfam-B_8787Pfam-B_6577Pfam-B_1407Pfam-B_2114
Pfam-B_2513 Pfam-B_11288 Pfam-B_9524
Pfam-B_786Pfam-B_5384 Pfam-B_2612
Pfam-B_8275Pfam-B_6708
Pfam-B_3094Pfam-B_10967Pfam-B_3385Pfam-B_3716Pfam-B_3849Pfam-B_7398Pfam-B_3864 Pfam-B_933
Pfam-B_9863S 4Pfam-B_617Pfam-B_3640Pfam-B_11446Pfam-B_6790 Pfam-B_4345Pfam-B_11447 Pfam-B_1942 Pfam-B_1918Pfam-B_9465
Pfam-B_5008 Pfam-B_4880Pfam-B_6836Pfam-B_4824 Pfam-B_3872Pfam-B_3705 Pfam-B_3871
Pfam-B_2036Pfam-B_2936Pfam-B_487Pfam-B_1346Pfam-B_9804Pfam-B_7496 Pfam-B_7357 Pfam-B_1108
Pfam-B_7737Pfam-B_5227Pfam-B_8180Pfam-B_8184Pfam-B_8208 Pfam-B_8319Pfam-B_8601 Pfam-B_7626Pfam-B_5188Pfam-B_217
g p d hPfam-B_10363Pfam-B_1045Pfam-B_10849Pfam-B_10954Pfam-B_10962Pfam-B_11127Pfam-B_11194
Pfam-B_8346 Pfam-B_1431
Pfam-B_577 Pfam-B_1272
Pfam-B_1287
Pfam-B_3211
Pfam-B_1129
Pfam-B_1130
Pfam-B_5458
Pfam-B_1441
Pfam-B_2069Pfam-B_3560
Pfam-B_2163
Pfam-B_11573Pfam-B_4831 Pfam-B_857
Pfam-B_11571Pfam-B_10716Pfam-B_775 Pfam-B_4266
Pfam-B_2121
Pfam-B_2888
Pfam-B_3782
Pfam-B_103Pfam-B_10872Pfam-B_11388
Pfam-B_6754Pfam-B_529
Pfam-B_1639
Pfam-B_1643Pfam-B_7359Pfam-B_5121
Pfam-B_2793Pfam-B_1118
Pfam-B_3045
Pfam-B_7495
Pfam-B_3450
Pfam-B_127
Pfam-B_1105
Pfam-B_181
Pfam-B_3101
Pfam-B_3047
Pfam-B_4884Pfam-B_3594
Pfam-B_1371Pfam-B_1318
Pfam-B_6581Pfam-B_3354
Pfam-B_959
Pfam-B_265
Pfam-B_10717
Pfam-B_3728
Pfam-B_1773
Pfam-B_5023
Pfam-B_5022
Pfam-B_5558
Pfam-B_3815
Pfam-B_10198 Pfam-B_186 Pfam-B_1752 Pfam-B_10873 Pfam-B_2487
Pfam-B_8467
Pfam-B_8672
Pfam-B_1596
Pfam-B_7153Pfam-B_3822
Pfam-B_522
Pfam-B_278 Pfam-B_652
Pfam-B_4146
Pfam-B_676
Pfam-B_2192
Pfam-B_8053Pfam-B_9657
Pfam-B_5896
Pfam-B_1095
Pfam-B_6872
Pfam-B_8637
Pfam-B_11366Pfam-B_2088Pfam-B_2741
Pfam-B_1563
Pfam-B_519
Pfam-B_4271Pfam-B_10725
Pfam-B_648
Pfam-B_329Pfam-B_2538
Pfam-B_2307Pfam-B_1178
Pfam-B_6579
Pfam-B_802
s i g m a 5 4
Pfam-B_631
Pfam-B_1115
Pfam-B_10
Pfam-B_21
Pfam-B_860
Pfam-B_7551Pfam-B_4705
Pfam-B_9601
Pfam-B_882Pfam-B_4706
Pfam-B_7768
Pfam-B_2642
Pfam-B_10624Pfam-B_10091
Pfam-B_1963
Pfam-B_4537Pfam-B_2359
Pfam-B_1964
Pfam-B_10092
Pfam-B_10086
Pfam-B_2638
Pfam-B_10087
Pfam-B_8799
Pfam-B_1257Pfam-B_3475
Pfam-B_10031
Pfam-B_10026Pfam-B_10032
Pfam-B_6097 Pfam-B_917Pfam-B_6327 Pfam-B_6768Pfam-B_3808
Pfam-B_7975
Pfam-B_2507
Pfam-B_1103Pfam-B_5615Pfam-B_2813 Pfam-B_4434Pfam-B_3623
Pfam-B_8880Pfam-B_4840
Pfam-B_3820
Pfam-B_3454
Pfam-B_1754
Pfam-B_8757
Pfam-B_6539 Pfam-B_621 Pfam-B_10381
Pfam-B_7637 Pfam-B_7406Pfam-B_4685 Pfam-B_8911
Pfam-B_1952
Pfam-B_4173Pfam-B_8049 Pfam-B_5280 Pfam-B_6088Pfam-B_4992 Pfam-B_3606Pfam-B_10328c o n n e x i nPfam-B_4983 Pfam-B_11615Pfam-B_7452
Pfam-B_1012
Pfam-B_9321
Pfam-B_5064Pfam-B_2218 Pfam-B_9762 Pfam-B_3348 Pfam-B_2536
Pfam-B_1317
Pfam-B_3355Pfam-B_7340
Pfam-B_4704
Pfam-B_2023
Pfam-B_1221
Pfam-B_2055Pfam-B_4707
Pfam-B_115 P r i b o s y l t r a n Pfam-B_4341Pfam-B_10795Pfam-B_558Pfam-B_1442Pfam-B_9318Pfam-B_5944 Pfam-B_8012Pfam-B_7540Pfam-B_8708
Pfam-B_10638Pfam-B_1087
Pfam-B_3579Pfam-B_1653
Pfam-B_9338Pfam-B_9329 Pfam-B_8013
Pfam-B_2003
Pfam-B_665 Pfam-B_8795Pfam-B_6609
Pfam-B_2987 Pfam-B_1766Pfam-B_853Pfam-B_2873Pfam-B_8356 Pfam-B_6982Pfam-B_7683 Pfam-B_5130Pfam-B_7833Pfam-B_883 Pfam-B_6868Pfam-B_11630Pfam-B_934Pfam-B_7547 Pfam-B_3399 Pfam-B_6731
Pfam-B_3034 Pfam-B_2625Pfam-B_2198Pfam-B_2199Pfam-B_8325Pfam-B_7292
Pfam-B_6519 Pfam-B_5997Pfam-B_5996 Pfam-B_11524Pfam-B_8499Pfam-B_9290 Pfam-B_5300 s o d f ePfam-B_3885Pfam-B_429Pfam-B_10635Pfam-B_7765Pfam-B_4633Pfam-B_1201Pfam-B_4775Pfam-B_4776Pfam-B_6735
Pfam-B_8237Pfam-B_8403Pfam-B_908Pfam-B_9860 Pfam-B_7003Pfam-B_7104Pfam-B_7190Pfam-B_7316 Pfam-B_7216 Pfam-B_6955Pfam-B_7254Pfam-B_7438Pfam-B_7835 Pfam-B_7658 Pfam-B_7614 Pfam-B_7579 Pfam-B_7497 Pfam-B_7494 Pfam-B_7470 Pfam-B_7448Pfam-B_1524Pfam-B_10399Pfam-B_6437Pfam-B_1117Pfam-B_521Pfam-B_1522Pfam-B_319Pfam-B_1866
Pfam-B_3089Pfam-B_78Pfam-B_5258
l i p a s e
Pfam-B_5059
Pfam-B_841
Pfam-B_7527Pfam-B_10366 Pfam-B_5333Pfam-B_3791
Pfam-B_3589
Pfam-B_753
Pfam-B_7531Pfam-B_76
Pfam-B_713 Pfam-B_5993
Pfam-B_3889Pfam-B_8235Pfam-B_738Pfam-B_87
Pfam-B_10663
Pfam-B_475 Pfam-B_628 Pfam-B_545Pfam-B_1910 Pfam-B_1415Pfam-B_2255Pfam-B_5520
Pfam-B_4152
Pfam-B_593Pfam-B_611 Pfam-B_86
Pfam-B_4247Pfam-B_5900Pfam-B_112Pfam-B_5592
Pfam-B_2753
Pfam-B_2833
Pfam-B_7063 Pfam-B_6803Pfam-B_3748Pfam-B_8489
Pfam-B_11894 Pfam-B_1930
HSP70
Pfam-B_785
Pfam-B_930
Pfam-B_2286
Pfam-B_852 Pfam-B_3929
Pfam-B_5123
Pfam-B_10588 Pfam-B_3361 Pfam-B_230 Pfam-B_10030 Pfam-B_3077Pfam-B_1597Pfam-B_2647Pfam-B_3061
Pfam-B_3070
Pfam-B_567
Pfam-B_7140
Pfam-B_10887
Pfam-B_289
Pfam-B_6172
Pfam-B_7178Pfam-B_1394
K H - d o m a i n
Pfam-B_1587
Pfam-B_3760Pfam-B_5092Pfam-B_9282
Pfam-B_1410
Pfam-B_586
tsp_1
Pfam-B_996
Pfam-B_11397
Pfam-B_6173
Pfam-B_1950
Pfam-B_1697
Pfam-B_1943
Pfam-B_1147
Pfam-B_9962
Pfam-B_953
Pfam-B_9878
Pfam-B_4820
Pfam-B_9346Pfam-B_6031
Pfam-B_417
Pfam-B_3856 Pfam-B_389
Pfam-B_6032
Pfam-B_9963
Pfam-B_4821Pfam-B_2426
Pfam-B_1180
Pfam-B_4312
Pfam-B_4310
Pfam-B_4327
Pfam-B_2850
Pfam-B_10318Pfam-B_9237
Pfam-B_4311
Pfam-B_2512
Pfam-B_2229
Pfam-B_2724
Pfam-B_3343
Pfam-B_2476Pfam-B_977
Pfam-B_10319
Pfam-B_4328
Pfam-B_3843
Pfam-B_1406
Pfam-B_2112
Pfam-B_6363
Pfam-B_6046Pfam-B_1332
Pfam-B_9612
Pfam-B_4344
Pfam-B_3735
Pfam-B_7383Pfam-B_641
Pfam-B_3846
Pfam-B_491
Pfam-B_9168Pfam-B_2725
DNA_pol
P fam-B_9162
Pfam-B_4313
Pfam-B_9158Pfam-B_9159
Pfam-B_4915 Pfam-B_3265
Pfam-B_7001
Pfam-B_4050
Pfam-B_7744
a d h _ s h o r t
P fam-B_650Pfam-B_3305
Pfam-B_8797
Pfam-B_5396Pfam-B_10682
Pfam-B_10598
Pfam-B_1565
Pfam-B_5390
Pfam-B_3301
Pfam-B_5389Pfam-B_9879
Pfam-B_1280
Pfam-B_8881
Pfam-B_6502Pfam-B_5720
Pfam-B_2001
Pfam-B_6851
Pfam-B_5526
Pfam-B_6163
Pfam-B_9871Pfam-B_6164
Pfam-B_2799
Pfam-B_6852
Pfam-B_3283
Pfam-B_5662
Pfam-B_3091
Pfam-B_5663
Pfam-B_816
Pfam-B_3108
Pfam-B_5664 Pfam-B_2563
Pfam-B_565Pfam-B_349
Pfam-B_3443
Pfam-B_7552
Pfam-B_1416
Pfam-B_3483Pfam-B_10090
Pfam-B_6786Pfam-B_1699
Pfam-B_10079
Pfam-B_2356
Pfam-B_7747
Pfam-B_3105
Pfam-B_3010
Pfam-B_11034 Pfam-B_6564
Pfam-B_2954
Pfam-B_4501
Pfam-B_2807
Pfam-B_224
Pfam-B_1962
Pfam-B_2657
Pfam-B_10679
Pfam-B_6939
Pfam-B_1132
Pfam-B_1769
Pfam-B_5686
Pfam-B_8585
Pfam-B_10666
Pfam-B_6940Pfam-B_8587
Pfam-B_873
Pfam-B_2662
Pfam-B_3585
Pfam-B_2495 Pfam-B_2369 Pfam-B_2338 Pfam-B_2287 Pfam-B_2171 Pfam-B_1762 Pfam-B_1749
Pfam-B_10129Pfam-B_10186Pfam-B_6306Pfam-B_10268Pfam-B_462
Pfam-B_11832
Pfam-B_6961Pfam-B_6962
Pfam-B_2064
Pfam-B_11783Pfam-B_2979
Pfam-B_6065
Pfam-B_3430
Pfam-B_6047
Pfam-B_2306
Pfam-B_5685
Pfam-B_2354
Pfam-B_5231
Pfam-B_9635
Pfam-B_9993
Pfam-B_2815
Pfam-B_9992
Pfam-B_2816
Pfam-B_9991
Pfam-B_8699
Pfam-B_902
Pfam-B_6044
Pfam-B_4984
Pfam-B_6231
Pfam-B_10451Pfam-B_10450
Pfam-B_4637
Pfam-B_983
Pfam-B_8650
Pfam-B_10452
Pfam-B_1626
Pfam-B_3860
Pfam-B_835
Pfam-B_2109
Pfam-B_7285
Pfam-B_1579
Pfam-B_1278
Pfam-B_106Pfam-B_250
Pfam-B_301
Pfam-B_2591
Pfam-B_362
Pfam-B_191Pfam-B_213
Pfam-B_6471Pfam-B_6701
Pfam-B_10190Pfam-B_3107
Pfam-B_5781
Pfam-B_7625
Pfam-B_7627Pfam-B_379
Pfam-B_824Pfam-B_7762Pfam-B_509
Pfam-B_10463
Pfam-B_3926Pfam-B_10465
Pfam-B_10464
Pfam-B_2760
Pfam-B_1175 Pfam-B_5387
Pfam-B_3164
Pfam-B_1678
Pfam-B_2293
Pfam-B_10470
Pfam-B_984
Pfam-B_10466Pfam-B_10469
Pfam-B_464
Pfam-B_55
Pfam-B_9342
Pfam-B_214
Pfam-B_165 Pfam-B_9778Pfam-B_318
Pfam-B_7755
Pfam-B_7134
Pfam-B_9030
Pfam-B_1356
Pfam-B_7593Pfam-B_1724
Pfam-B_2837
Pfam-B_6999Pfam-B_7607Pfam-B_5157
Pfam-B_2124
Pfam-B_11143Pfam-B_11841
Pfam-B_3149Pfam-B_3154 Pfam-B_6827 Pfam-B_7608
Pfam-B_510
Pfam-B_2870
Pfam-B_7135 Pfam-B_4569Pfam-B_4570
Pfam-B_2371
Pfam-B_1847Pfam-B_3286
Pfam-B_421
Pfam-B_5318
Pfam-B_10731
Pfam-B_5620Pfam-B_2895
Pfam-B_1101
Pfam-B_6140Pfam-B_1217
Pfam-B_6450
Pfam-B_5991
Pfam-B_3268Pfam-B_4185
Pfam-B_6529
Pfam-B_190
Pfam-B_622Pfam-B_7615
Pfam-B_7580
Pfam-B_10460Pfam-B_703
Pfam-B_906Pfam-B_7624
Pfam-B_7594
Pfam-B_3344 Pfam-B_6991 Pfam-B_6317 Pfam-B_2303Pfam-B_1225
Pfam-B_7002Pfam-B_3345
Pfam-B_10674Pfam-B_3525
Pfam-B_2382
Pfam-B_4384
Pfam-B_11367Pfam-B_11360
Pfam-B_542Pfam-B_2397 Pfam-B_7774
Pfam-B_3321
Pfam-B_2098
Pfam-B_2005
s u b t i l a s e
Pfam-B_413 Pfam-B_4286
Pfam-B_2526Pfam-B_5671
Pfam-B_4285Pfam-B_9700
Pfam-B_4803
Pfam-B_3915
Pfam-B_7007
Pfam-B_1463Pfam-B_1114
Pfam-B_1003Pfam-B_5548
Pfam-B_3569
Pfam-B_868
Pfam-B_3567 Pfam-B_8968Pfam-B_9211
Pfam-B_3366
Pfam-B_1380
Pfam-B_2947
Pfam-B_11369
Pfam-B_535Pfam-B_325
Pfam-B_1322Pfam-B_4993Pfam-B_8563Pfam-B_9409 Pfam-B_3521 Pfam-B_323Pfam-B_3284Pfam-B_3378Pfam-B_6089
Pfam-B_2415
Pfam-B_2417 Pfam-B_247Pfam-B_5118Pfam-B_351
Pfam-B_10693
Pfam-B_10529
Pfam-B_8755
Pfam-B_3565
Pfam-B_3600Pfam-B_3455
Pfam-B_2908
Pfam-B_580Pfam-B_2422
Pfam-B_9320
Pfam-B_1319Pfam-B_2040
Pfam-B_10926Pfam-B_5424 Pfam-B_639 COX2 Pfam-B_10932 Pfam-B_4183 Pfam-B_871Pfam-B_8932 Pfam-B_11111Pfam-B_5859Pfam-B_5611Pfam-B_4579
Pfam-B_5553
Pfam-B_4193
Pfam-B_7207
Pfam-B_672 Pfam-B_5552 Pfam-B_5003
Pfam-B_1919
Pfam-B_562
Pfam-B_6742
Pfam-B_1790
Pfam-B_435
Pfam-B_377
Pfam-B_1485 Pfam-B_9357
Pfam-B_4010
Pfam-B_3558
Pfam-B_1518
Pfam-B_1214
p y r _ r e d o x
Pfam-B_10724
Pfam-B_584
Pfam-B_1864
Pfam-B_2991
Pfam-B_9415
Pfam-B_2565Pfam-B_679
Pfam-B_3184Pfam-B_5647
Pfam-B_6606
Pfam-B_4518
Pfam-B_3183
Pfam-B_11422Pfam-B_4068
Pfam-B_457
Pfam-B_8127
Pfam-B_5379
Pfam-B_8109
Pfam-B_9319
Pfam-B_2599
Pfam-B_4470
Pfam-B_4925
Pfam-B_8056Pfam-B_1249
Pfam-B_8742
Pfam-B_8688
Pfam-B_1296Pfam-B_3737
Pfam-B_1487
Pfam-B_5627
Pfam-B_8043
Pfam-B_5628
Pfam-B_2748
7 t m _ 2
Pfam-B_5019
Pfam-B_615Pfam-B_9699
Pfam-B_2317
Pfam-B_3437Pfam-B_3054Pfam-B_583
Pfam-B_5104
Pfam-B_1759
Pfam-B_1553
Pfam-B_3667
Pfam-B_11501Pfam-B_3668 Pfam-B_2243
Pfam-B_1552
Pfam-B_5804
Pfam-B_1877
Pfam-B_740Pfam-B_7416
Pfam-B_1484Pfam-B_4364
Pfam-B_7417
Pfam-B_153
S 1 2
Pfam-B_1248
Pfam-B_664 Pfam-B_5094Pfam-B_2129
Pfam-B_1174Pfam-B_4401
Pfam-B_11163Pfam-B_6015
Pfam-B_7662
Pfam-B_494Pfam-B_4963
Pfam-B_48
Pfam-B_2762Pfam-B_1327
Pfam-B_2295
Pfam-B_4414
Pfam-B_11162
Pfam-B_11141
Pfam-B_11140Pfam-B_451
Pfam-B_1061
Pfam-B_1062
Pfam-B_2294
Pfam-B_5004Pfam-B_6834
Pfam-B_11566Pfam-B_3721Pfam-B_11142
Pfam-B_4407
Pfam-B_182Pfam-B_663
Pfam-B_9547
Pfam-B_11864
p h o t o R C Pfam-B_6946
Pfam-B_70
Pfam-B_35
response_regPfam-B_4887
Pfam-B_6174Pfam-B_2071
Pfam-B_3162
Pfam-B_2927
Pfam-B_2443
Pfam-B_5377
Pfam-B_5925Pfam-B_3603Pfam-B_5158
Pfam-B_9341
Pfam-B_6995
Pfam-B_6996
Pfam-B_4710
Pfam-B_1172
Pfam-B_6994
Pfam-B_7605
Pfam-B_7600
Pfam-B_1421
Pfam-B_7602
Pfam-B_7606Pfam-B_3917
Pfam-B_862Pfam-B_7603
Pfam-B_1420Pfam-B_7601
Pfam-B_4817
Pfam-B_613
Pfam-B_1562
Pfam-B_2928
Pfam-B_3412
Pfam-B_3724Pfam-B_612Pfam-B_7108
Pfam-B_2926Pfam-B_3411
Pfam-B_9565
Pfam-B_856Pfam-B_8
Pfam-B_7465
Pfam-B_5803
Pfam-B_9701 Pfam-B_291
Pfam-B_7237
Pfam-B_2454
Pfam-B_5670 Pfam-B_8179
Pfam-B_839 Pfam-B_7464 Pfam-B_7098 Pfam-B_4054 Pfam-B_3185Pfam-B_10677
Pfam-B_8623Pfam-B_8662
Pfam-B_10688
Pfam-B_8720 Pfam-B_3807 Pfam-B_10517
Pfam-B_2141 Pfam-B_739
Pfam-B_2418 Pfam-B_3006
Pfam-B_3182Pfam-B_2685
Pfam-B_1937Pfam-B_8712Pfam-B_8067
Pfam-B_3168
Pfam-B_8137Pfam-B_8167
Pfam-B_1780
Pfam-B_4339Pfam-B_327
Pfam-B_11276
Pfam-B_7062
Pfam-B_1991
Pfam-B_11929 Pfam-B_232Pfam-B_2986
Pfam-B_2453
Pfam-B_10637
Pfam-B_10640
Pfam-B_6537
Pfam-B_838
Pfam-B_56
Pfam-B_981
tRNA-syn t_1
Pfam-B_1936
Zn_c lus
a l d e d hPfam-B_1946
Pfam-B_3931
Pfam-B_7834
Pfam-B_4641
Pfam-B_3016 Pfam-B_9146Pfam-B_9143
Pfam-B_4194v w d
Pfam-B_8680
Pfam-B_895Pfam-B_173
Pfam-B_200
Pfam-B_4804
Pfam-B_204
Pfam-B_10633
Pfam-B_2017
Pfam-B_374
Pfam-B_6810
Pfam-B_9144
Pfam-B_8670
Pfam-B_5835
Pfam-B_3330
Pfam-B_4143Pfam-B_5974
Pfam-B_469
Pfam-B_10775
Pfam-B_647
Pfam-B_1462Pfam-B_4548
Pfam-B_8718
Pfam-B_3736
Pfam-B_900
Pfam-B_4703
Pfam-B_10281
Pfam-B_2072
Pfam-B_2560
Pfam-B_8716
Pfam-B_8719
Pfam-B_4634
Pfam-B_2478
Pfam-B_2559
Pfam-B_2466
Pfam-B_3780
Pfam-B_4961
Pfam-B_3538Pfam-B_4960
Pfam-B_8667
Pfam-B_682
Pfam-B_3500
Pfam-B_5131
Pfam-B_10214Pfam-B_1978Pfam-B_10216
Pfam-B_5133Pfam-B_2370
Pfam-B_4801
Pfam-B_1551
Pfam-B_7553
Pfam-B_4593
Pfam-B_10288
Pfam-B_4592
Pfam-B_6424
Pfam-B_1658
Pfam-B_6513
Pfam-B_4619
Pfam-B_6415
Pfam-B_3529
Pfam-B_1659
ce l l u l ase
Pfam-B_8891
Pfam-B_677
Pfam-B_1051
Pfam-B_9192
Pfam-B_5583
Pfam-B_1301
Pfam-B_10625
Pfam-B_3943
Pfam-B_6617
Pfam-B_914
Pfam-B_1990
Pfam-B_283
Pfam-B_348
Pfam-B_4618
Pfam-B_10368
Pfam-B_2166
Pfam-B_5482
Pfam-B_10546
Pfam-B_8358
Pfam-B_2068
lec t in_ legA
Pfam-B_8423
Pfam-B_2050
Pfam-B_2616
Pfam-B_8430
Pfam-B_7248
Pfam-B_1395
Pfam-B_1575Pfam-B_2499
Pfam-B_1783
Pfam-B_1111Pfam-B_360
Pfam-B_3022
Pfam-B_7246
Pfam-B_832
Pfam-B_10511
Pfam-B_4822
Pfam-B_10509
Pfam-B_10510
Pfam-B_6733Pfam-B_2953
Pfam-B_427
Pfam-B_6875
Pfam-B_707
Pfam-B_4314
Pfam-B_364Pfam-B_363
Pfam-B_9637
Pfam-B_3424Pfam-B_758
Pfam-B_2723
Pfam-B_4424
Pfam-B_1680
Pfam-B_9638
Pfam-B_3341
Pfam-B_6384
Pfam-B_4213
Pfam-B_255
Pfam-B_1474
Pfam-B_1165
Pfam-B_6451Pfam-B_1660
cy toch rome_c
Pfam-B_8546
Pfam-B_8548
Pfam-B_5584
Pfam-B_5687
Pfam-B_684
Pfam-B_1308
Pfam-B_5814
Pfam-B_10813
Pfam-B_4721
Pfam-B_2848
Pfam-B_1477
Pfam-B_1663
Pfam-B_8734
Pfam-B_4212
Pfam-B_199
Pfam-B_3672
Pfam-B_10512
Pfam-B_4024
Pfam-B_4873
Pfam-B_1399
Pfam-B_10289
Pfam-B_1316Pfam-B_6535
Pfam-B_1289
Pfam-B_10358
Pfam-B_6423
Pfam-B_10290
Pfam-B_6622
Pfam-B_1685
Pfam-B_1728
Pfam-B_2383Pfam-B_5679Pfam-B_5815
Pfam-B_4591
Pfam-B_4224Pfam-B_5812
Pfam-B_11192
Pfam-B_11301
Pfam-B_5734
Pfam-B_1668
Pfam-B_8026
Pfam-B_9255
Pfam-B_8027
Pfam-B_2826
Pfam-B_7119
Pfam-B_9180
Pfam-B_1893
Pfam-B_1965Pfam-B_4696
Pfam-B_1961
Pfam-B_7664
Pfam-B_11775
Pfam-B_9436
Pfam-B_6485
Pfam-B_4494Pfam-B_10259
Pfam-B_9898
Pfam-B_1509
Pfam-B_6180
Pfam-B_4492Pfam-B_2339
Pfam-B_9905
Pfam-B_350
Pfam-B_5750
Pfam-B_9044
Pfam-B_337
Pfam-B_9041
Pfam-B_2801
Pfam-B_4142
Pfam-B_9045
Pfam-B_3461Pfam-B_4489Pfam-B_220
Pfam-B_801Pfam-B_4898
Pfam-B_10904
Pfam-B_646
Pfam-B_2729
Pfam-B_9241
Pfam-B_11280
Pfam-B_5533
Pfam-B_1739Pfam-B_605
Pfam-B_2421
Pfam-B_2907
Pfam-B_10758Pfam-B_2420
Pfam-B_6588
Pfam-B_167Pfam-B_467
Pfam-B_8435Pfam-B_3824
Pfam-B_1373
Pfam-B_2814
Pfam-B_9974
Pfam-B_9
Pfam-B_22
Pfam-B_233
Pfam-B_174
Pfam-B_4712
Pfam-B_10736
Pfam-B_10748Pfam-B_10751
Pfam-B_10760
Pfam-B_2419
Pfam-B_10752
Pfam-B_16Pfam-B_2906
Pfam-B_4415
Pfam-B_1445
Pfam-B_8322
GTP_EFTU
Pfam-B_7695
Pfam-B_982
Pfam-B_4006
Pfam-B_7932
Pfam-B_30
Pfam-B_4140
Pfam-B_7816
Pfam-B_10015
Pfam-B_585
Pfam-B_9928
Pfam-B_1390
Pfam-B_2085 Pfam-B_2694
Pfam-B_4939
Pfam-B_10170
Pfam-B_210
Pfam-B_11329
Pfam-B_229
Pfam-B_3972
Pfam-B_8096Pfam-B_1440
c a d h e r i n
Pfam-B_874
Pfam-B_8204
Pfam-B_3978
Pfam-B_2028
Pfam-B_5708
Pfam-B_2274
Pfam-B_8862
Pfam-B_7155
Pfam-B_675
Pfam-B_9006Pfam-B_3014
Pfam-B_7546Pfam-B_7659
Pfam-B_10473
Pfam-B_10435
Pfam-B_6033f e r 4Pfam-B_9562
Pfam-B_7286
Pfam-B_7274
Pfam-B_5350
Pfam-B_2900
Pfam-B_7981
Pfam-B_7845
Pfam-B_7849
t r e f o i l
P fam-B_3081
Pfam-B_6957
Pfam-B_1107
Pfam-B_4606
Pfam-B_2550Pfam-B_9007
Pfam-B_6398s e r p i n
Pfam-B_40
Pfam-B_9356
Pfam-B_8302
v w c
Pfam-B_1275
Pfam-B_1274
f ib r inogen_C
Pfam-B_4474Pfam-B_5441
t r y p s i nKuni tz_BPTI
Pfam-B_6988
h o m e o b o x
Pfam-B_10294
Pfam-B_5341Pfam-B_5413
Pfam-B_10147Pfam-B_11021
Pfam-B_1043
Pfam-B_3714Pfam-B_1240
Pfam-B_7640
Pfam-B_8076
Pfam-B_5196
Pfam-B_3995Pfam-B_8466
Pfam-B_4269
Pfam-B_7453
Pfam-B_5087Pfam-B_9683
Pfam-B_8380
Pfam-B_2643
Pfam-B_3223
Pfam-B_10351
Pfam-B_11100
Pfam-B_6417
Pfam-B_8371
Pfam-B_1593Pfam-B_6859
Pfam-B_5512Pfam-B_1822Pfam-B_10719
Pfam-B_1144
Pfam-B_7091
Pfam-B_1628
p o u
Pfam-B_3322
Pfam-B_89
Pfam-B_5759
Pfam-B_5758
Pfam-B_1299
Pfam-B_8545
Pfam-B_3324
Pfam-B_8547
Pfam-B_8544
Pfam-B_2684
Pfam-B_5793Pfam-B_8960Pfam-B_1644
Pfam-B_3303
Pfam-B_8763
Pfam-B_11649
k e t o a c y l - s y n tPfam-B_50
Pfam-B_117
Pfam-B_235
Pfam-B_1908
Pfam-B_5747
Pfam-B_927
Pfam-B_5703
Pfam-B_1953Pfam-B_4511
Pfam-B_9998
Pfam-B_2346
Pfam-B_4508Pfam-B_970
Pfam-B_4509
Pfam-B_4510
Pfam-B_6237
Pfam-B_6337
Pfam-B_6336
Pfam-B_6335
Pfam-B_7195
Pfam-B_6333
Pfam-B_10226
Pfam-B_5000
Pfam-B_517
Pfam-B_133
Pfam-B_177
Pfam-B_126
Pfam-B_2988
Pfam-B_1233
Pfam-B_234
Pfam-B_6862
Pfam-B_405
Pfam-B_6863
Pfam-B_36
Pfam-B_17
Pfam-B_52
Pfam-B_53Pfam-B_287
Pfam-B_24
Pfam-B_6492
Pfam-B_6334
Pfam-B_1232
Pfam-B_6819
Pfam-B_1761
Pfam-B_6822
Pfam-B_560
Pfam-B_10508
Pfam-B_10370
Pfam-B_1354
Pfam-B_1079
Pfam-B_3528
Pfam-B_6426
Pfam-B_10371
Pfam-B_10373
Pfam-B_2956
Pfam-B_6736
Pfam-B_355
Pfam-B_3670
Pfam-B_11511Pfam-B_6846
Pfam-B_5669
Pfam-B_5768
Pfam-B_750
Pfam-B_8919
Pfam-B_1650
Pfam-B_4255
Pfam-B_8927
Pfam-B_5361
Pfam-B_5435
Pfam-B_6419
Pfam-B_359
Pfam-B_6915
Pfam-B_9606
Pfam-B_6987
Pfam-B_11092
Pfam-B_1876
Pfam-B_11075
tRNA-syn t_2
Pfam-B_4199
Pfam-B_4908
Pfam-B_3083
Pfam-B_5650Pfam-B_4909
Pfam-B_11898
Pfam-B_11897
Pfam-B_2923Pfam-B_1297
Pfam-B_4645
Pfam-B_196
Pfam-B_7505
Pfam-B_20
Pfam-B_37
Pfam-B_830
Pfam-B_671p i l i n
P fam-B_11130
Pfam-B_3883
Pfam-B_4497
Pfam-B_10678Pfam-B_402
Pfam-B_2342
Pfam-B_1694
Pfam-B_1695
Pfam-B_9931
Pfam-B_9933TGF-be ta
Pfam-B_2377Pfam-B_7504
Pfam-B_3715
Pfam-B_11256
Pfam-B_367Pfam-B_2158 Pfam-B_150Pfam-B_58
Pfam-B_2622Pfam-B_2181
Pfam-B_9932
Pfam-B_9934Pfam-B_1339Pfam-B_6195
Pfam-B_8684Pfam-B_9935
Pfam-B_370
Pfam-B_3463
Pfam-B_136
Pfam-B_2834
Pfam-B_893 Pfam-B_3495
Pfam-B_1197
Pfam-B_8869
Pfam-B_1457
Pfam-B_6650
Pfam-B_11278
Pfam-B_270Pfam-B_178
Pfam-B_1782
Pfam-B_4646
Pfam-B_3547
Pfam-B_4324Pfam-B_4521
Pfam-B_2350
Pfam-B_73Pfam-B_5864
Pfam-B_2836
Pfam-B_10189
Pfam-B_11068
Pfam-B_727
Pfam-B_3575
Pfam-B_6380
Pfam-B_803
Pfam-B_2488
Pfam-B_3474
Pfam-B_6378
Pfam-B_2351
Pfam-B_10239Pfam-B_718
Pfam-B_2841
Pfam-B_3507
Pfam-B_4483
Pfam-B_969
Pfam-B_4675
Pfam-B_6339
Pfam-B_5654
Pfam-B_7900
Pfam-B_11123
Pfam-B_8313
Pfam-B_2844Pfam-B_1758
Pfam-B_6330
Pfam-B_1755
Pfam-B_505
Pfam-B_423
Pfam-B_10246Pfam-B_189
Pfam-B_6332
Pfam-B_10245
Pfam-B_10242
Pfam-B_1521
Pfam-B_11487
Pfam-B_11490
Pfam-B_826
Pfam-B_10237
Pfam-B_7197Pfam-B_6342
Pfam-B_1706
Pfam-B_3505
Pfam-B_827
Pfam-B_765
Pfam-B_6341
Pfam-B_6338
Pfam-B_4997
Pfam-B_10236
Pfam-B_10244
Pfam-B_10241
Pfam-B_7350
Pfam-B_7349
Pfam-B_1349
Pfam-B_1182Pfam-B_2310
Pfam-B_6287
Pfam-B_3074
Pfam-B_1302
Pfam-B_3059
Pfam-B_7999
Pfam-B_8447
Pfam-B_11896
Pfam-B_11155
Pfam-B_11145
Pfam-B_2416
Pfam-B_10001
f i l a m e n t
Pfam-B_324
Pfam-B_2555
Pfam-B_2262
Pfam-B_162
Pfam-B_148
Pfam-B_122
Pfam-B_1116
Pfam-B_176
Pfam-B_1203 Pfam-B_10297
Pfam-B_768
Pfam-B_6371
Pfam-B_7665
Pfam-B_7667
Pfam-B_7666
Pfam-B_7668
Pfam-B_701
Pfam-B_6369
Pfam-B_915
Pfam-B_3574
Pfam-B_1731 Pfam-B_3947
Pfam-B_9692
Pfam-B_1106Pfam-B_2173
Pfam-B_3767
Pfam-B_6578
Pfam-B_831
Pfam-B_10720
Pfam-B_1282
Pfam-B_3327
DAG_PE-bind
Pfam-B_2172Pfam-B_954
Pfam-B_10397
Pfam-B_9767
Pfam-B_8150
s u g a r _ t r
P fam-B_5411
Pfam-B_7235
Pfam-B_1110
Pfam-B_1574
Pfam-B_10299
Pfam-B_2773
Pfam-B_9661
Pfam-B_2846
Pfam-B_10947
Pfam-B_4594
Pfam-B_10721Pfam-B_11146
Pfam-B_11038
Pfam-B_5211
Pfam-B_1067
h e m o p e x i n
Pfam-B_4128
Pfam-B_358
Pfam-B_5084
Pfam-B_4129
Pfam-B_6217
Pfam-B_6216
Pfam-B_2463Pfam-B_3037
Pfam-B_4945
Pfam-B_941
Pfam-B_729
Pfam-B_730
Pfam-B_6221
Pfam-B_7355
Pfam-B_5267
Pfam-B_5265
Pfam-B_7354
Pfam-B_1127Pfam-B_6992Pfam-B_2477
Pfam-B_11928
Pfam-B_3880Pfam-B_108
Pfam-B_2288
Pfam-B_2291Pfam-B_1922
Pfam-B_2842Pfam-B_7488
Pfam-B_10589Pfam-B_7487
Pfam-B_10585
Pfam-B_60
Pfam-B_10587
Pfam-B_31
Pfam-B_2880Pfam-B_5117
Pfam-B_1089
Pfam-B_10727
Pfam-B_159
Pfam-B_1220Pfam-B_1351
Pfam-B_3593
Pfam-B_10708Pfam-B_918
Pfam-B_3069 Pfam-B_8724
Pfam-B_75
Pfam-B_674Pfam-B_382
Pfam-B_582Pfam-B_530
Pfam-B_955zf-CCHC
Pfam-B_1152
Pfam-B_7489Pfam-B_939Pfam-B_620
Pfam-B_3572
Pfam-B_2147
Pfam-B_2009
Pfam-B_4663
Pfam-B_3120
Pfam-B_5266
Pfam-B_1360
Pfam-B_7838
Pfam-B_2007
Pfam-B_13Pfam-B_422
Pfam-B_109
Pfam-B_7196
Pfam-B_766
Pfam-B_411
Pfam-B_3508
Pfam-B_10243
Pfam-B_6340
Pfam-B_170
Pfam-B_5368
Pfam-B_11828
Pfam-B_10576
Pfam-B_6006
Pfam-B_7491
Pfam-B_10403
Pfam-B_10575
Pfam-B_10573
Pfam-B_10572
Pfam-B_372 Pfam-B_2792
Pfam-B_1502Pfam-B_630 Pfam-B_5653
Pfam-B_1729Pfam-B_2676
Pfam-B_2881
Pfam-B_2893
Pfam-B_1533
Pfam-B_1186Pfam-B_2400 Pfam-B_2884
Pfam-B_3428 Pfam-B_3837
Pfam-B_10711
Pfam-B_10580
Pfam-B_11744
Pfam-B_4881
Pfam-B_993
Pfam-B_10726
Pfam-B_1234
Pfam-B_10581Pfam-B_10578
Pfam-B_5173Pfam-B_10562
Pfam-B_5153Pfam-B_7233
Pfam-B_4666
Pfam-B_284
Pfam-B_6012
Pfam-B_141Pfam-B_1223
Pfam-B_160
Pfam-B_2758
Pfam-B_6580
Pfam-B_919
Pfam-B_1241
Pfam-B_2989
Pfam-B_774
Pfam-B_2292
Pfam-B_2290
Pfam-B_3564
Pfam-B_1677
Pfam-B_3879
Pfam-B_6003Pfam-B_2759 Pfam-B_6000
Pfam-B_2299Pfam-B_2945
r v t
P fam-B_881
r h v
Pfam-B_3263
Pfam-B_3299
Pfam-B_10556
Pfam-B_3212
Pfam-B_7480Pfam-B_659
Pfam-B_5111
Pfam-B_10607
Pfam-B_7486
Pfam-B_5069
Pfam-B_7372
Pfam-B_1323
Pfam-B_2528
r n a s e HPfam-B_859
Pfam-B_1803
Pfam-B_5073 Pfam-B_3876Pfam-B_2529
Pfam-B_497
Pfam-B_4261H L H
Pfam-B_6516
Pfam-B_10601Pfam-B_7482
Pfam-B_134
Pfam-B_3703
Pfam-B_10617
Pfam-B_125
Pfam-B_10563
Pfam-B_10558
Pfam-B_49
Pfam-B_2327
Pfam-B_42
Pfam-B_38Pfam-B_4665
Pfam-B_41
Pfam-B_4243
Pfam-B_110
Pfam-B_3999
Pfam-B_1640Pfam-B_2531
Pfam-B_6511
Pfam-B_1260
Pfam-B_3068
Pfam-B_6002Pfam-B_10604
Pfam-B_1804
Pfam-B_10597
Pfam-B_6515Pfam-B_10543
Pfam-B_3050
Pfam-B_97
Pfam-B_11591
Pfam-B_7451
Pfam-B_11753d s r m
RuBisCO_smal l
P fam-B_4976
Pfam-B_4061
Pfam-B_8120
Pfam-B_2204
Pfam-B_303
Pfam-B_5007
c o p p e r - b i n d
Pfam-B_8749
Pfam-B_870
r a s
Pfam-B_11716
Pfam-B_11715
Pfam-B_9936
Pfam-B_10471
Pfam-B_5293
Pfam-B_2434
Pfam-B_11906
Pfam-B_1357
Pfam-B_597
Pfam-B_2784
Pfam-B_4931
Pfam-B_7301
Pfam-B_3901
Pfam-B_2362
Pfam-B_2995
Pfam-B_2581Pfam-B_7880
Pfam-B_7873
Pfam-B_7017
Pfam-B_474
Pfam-B_5973
Pfam-B_6721
lamin in_B
Pfam-B_8034
Pfam-B_6532
Pfam-B_6534
Pfam-B_1041Pfam-B_3063
Pfam-B_7846
Pfam-B_7174
Pfam-B_336Pfam-B_728
Pfam-B_307
Pfam-B_2211Pfam-B_11762
Pfam-B_11763
a n k
Pfam-B_8273
Pfam-B_1084
Pfam-B_7033
Pfam-B_633
Pfam-B_3121
Pfam-B_5780
Pfam-B_9398
Pfam-B_4888
Pfam-B_5958
a p p l e
Pfam-B_8482Pfam-B_2639
Pfam-B_9028
Pfam-B_3852
Pfam-B_6041Pfam-B_1928Pfam-B_3422
Pfam-B_2464
Pfam-B_9544
Pfam-B_8209
Pfam-B_452
Pfam-B_7201
Pfam-B_4409
Pfam-B_896
Pfam-B_478
Pfam-B_11209
Pfam-B_2015
Pfam-B_540
Pfam-B_8332Pfam-B_1835
Pfam-B_10608
COX1
Pfam-B_7477
Pfam-B_23
Pfam-B_2527
Pfam-B_5490
Pfam-B_3875
Pfam-B_440
Pfam-B_4163
Pfam-B_9912
Pfam-B_8494
Pfam-B_3136
Pfam-B_6304
Pfam-B_8528
Pfam-B_6303
Pfam-B_2149
Pfam-B_1022Pfam-B_3987
Pfam-B_4008Pfam-B_8333
Pfam-B_1820
Pfam-B_5491Pfam-B_10599
Pfam-B_3985
Pfam-B_3122
Pfam-B_7565
Pfam-B_731
Pfam-B_8334
Pfam-B_481
Pfam-B_2572
Pfam-B_3988
Pfam-B_5643
Pfam-B_5797
Pfam-B_6323
Pfam-B_2838
Pfam-B_6480
Pfam-B_3905
Pfam-B_3903Pfam-B_1595
Pfam-B_10218Pfam-B_1418
Pfam-B_4279
Pfam-B_1375
Pfam-B_6435
Pfam-B_10787
Pfam-B_6878
Pfam-B_7769
Pfam-B_10791
Pfam-B_7771
Pfam-B_749Pfam-B_9459
Pfam-B_1258
Pfam-B_533
Pfam-B_9359Pfam-B_8722
Pfam-B_5376
Pfam-B_9827
Pfam-B_10789
Pfam-B_6598
Pfam-B_4885
Pfam-B_10793
Pfam-B_10785Pfam-B_10790
Pfam-B_11805
Pfam-B_11806
Pfam-B_6599
Pfam-B_10786Pfam-B_10788
Pfam-B_10784
ATP-synt_C
Pfam-B_777
Pfam-B_11350
Pfam-B_2914
Pfam-B_5412
Pfam-B_1871
Pfam-B_4382
Pfam-B_9455
Pfam-B_1913
Pfam-B_9303
Pfam-B_7426
Pfam-B_6728
ld l_ recept_b
Pfam-B_6723
Pfam-B_4046
Pfam-B_9285
Pfam-B_9305
Pfam-B_9049
Pfam-B_6478
Pfam-B_7569
Pfam-B_2240
ho rmone_ rec
Pfam-B_9057Pfam-B_2469
Pfam-B_563
Pfam-B_3904
Pfam-B_7770Pfam-B_7772
Pfam-B_8727
Pfam-B_9365
Pfam-B_8717
Pfam-B_3314
Pfam-B_2530
Pfam-B_4396
Pfam-B_4395
Pfam-B_880Pfam-B_909
Pfam-B_2524
Pfam-B_4397
Pfam-B_8116
Pfam-B_4398
Pfam-B_2405
Pfam-B_400
Pfam-B_7483
Pfam-B_10574
Pfam-B_2011Pfam-B_5112
Pfam-B_496
Pfam-B_1531
Pfam-B_2297
Pfam-B_9543
Pfam-B_9662
Pfam-B_2289
Pfam-B_99 Pfam-B_6610Pfam-B_11860
Pfam-B_5163
m i t o _ c a r r
P fam-B_3612
Pfam-B_302
Pfam-B_3062
Pfam-B_9545
Pfam-B_1801
Pfam-B_1730
Pfam-B_4402
Pfam-B_8210
Pfam-B_8483
Pfam-B_6182
Pfam-B_3862Pfam-B_3854
Pfam-B_3853Pfam-B_4722
Pfam-B_901
Pfam-B_4506Pfam-B_9706
Pfam-B_6086
Pfam-B_81
Pfam-B_10600
Pfam-B_757
Pfam-B_85
Pfam-B_5113
Pfam-B_7481
Pfam-B_5114Pfam-B_495
Pfam-B_555
Pfam-B_2403
Pfam-B_5115
Pfam-B_6018
Pfam-B_10594
Pfam-B_19r v p
Pfam-B_339
Pfam-B_6017
Pfam-B_7479
Pfam-B_6801
Pfam-B_1361
Pfam-B_11467
Pfam-B_6802
Pfam-B_9531
Pfam-B_297
Pfam-B_5199
Pfam-B_5510
Pfam-B_8373Pfam-B_3173
Pfam-B_7056
Pfam-B_7681
Pfam-B_7680Pfam-B_5198
Pfam-B_368
Pfam-B_594
Pfam-B_7756Pfam-B_7115
Pfam-B_2296
Pfam-B_331
Pfam-B_9215
Pfam-B_762 Pfam-B_6660
Pfam-B_4126
Pfam-B_10723
Pfam-B_8443
Pfam-B_8460
Pfam-B_8370
Pfam-B_10722
Pfam-B_5540
Pfam-B_4659
Pfam-B_2230
Pfam-B_8383
Pfam-B_8005
Pfam-B_2298
Pfam-B_4403
Pfam-B_9548
Pfam-B_9533
Pfam-B_1824
Pfam-B_6004
Pfam-B_4082
Pfam-B_1802
Pfam-B_5116Pfam-B_1489
Pfam-B_10718
Pfam-B_8409
Pfam-B_3758
Pfam-B_8372
Pfam-B_5045
ABC_tran
r r m
Pfam-B_4724
zf -C4
Pfam-B_5323
Pfam-B_975
Pfam-B_7751
Pfam-B_6916
Pfam-B_3517
Pfam-B_447
Pfam-B_5182
Pfam-B_5428
Pfam-B_5810
Pfam-B_889
Pfam-B_5808
Pfam-B_2139
Pfam-B_10993
Pfam-B_8954
Pfam-B_814
Pfam-B_9276
Pfam-B_9682
Pfam-B_100
Pfam-B_3249
Pfam-B_8523
Pfam-B_10527
Pfam-B_1649
Pfam-B_4206
Pfam-B_5964
Pfam-B_425
Pfam-B_3492
Pfam-B_4631
Pfam-B_3222
Pfam-B_5317
Pfam-B_8955Pfam-B_4130
Pfam-B_1705Pfam-B_1613
Pfam-B_5541Pfam-B_8524
Pfam-B_8379
Pfam-B_7997
Pfam-B_335
Pfam-B_1800
Pfam-B_3994
Pfam-B_5340
Pfam-B_1414
Pfam-B_4127Pfam-B_4658
Pfam-B_5514
Pfam-B_2630
Pfam-B_948
Pfam-B_5339
Pfam-B_9684
Pfam-B_4894
Pfam-B_8163
Pfam-B_6845
Pfam-B_2188
Pfam-B_667Pfam-B_2304
Pfam-B_3097
Pfam-B_5197
Pfam-B_5109
Pfam-B_2131
Pfam-B_7686Pfam-B_1143
Pfam-B_3237
Pfam-B_3540
Pfam-B_910
Pfam-B_4394
Pfam-B_607
Pfam-B_2245
Pfam-B_8468
Pfam-B_6013
Pfam-B_2578
Pfam-B_3444
Pfam-B_5162
Pfam-B_5304
Pfam-B_1698
Pfam-B_819
Pfam-B_11574
Pfam-B_5078
Y_phosphatase
P fam-B_5075
Pfam-B_1056
Pfam-B_4110
Pfam-B_5960
Pfam-B_11741
Pfam-B_9439
Pfam-B_2111Pfam-B_3048
Pfam-B_7373Pfam-B_773
Pfam-B_9848
Pfam-B_3842Pfam-B_5072
Pfam-B_705Pfam-B_5070
Pfam-B_2978
Pfam-B_9846Pfam-B_201
Pfam-B_3848
Pfam-B_8108
Pfam-B_424Pfam-B_767
Pfam-B_2516
Pfam-B_1206
Pfam-B_794
Pfam-B_506
Pfam-B_2220Pfam-B_2063Pfam-B_2219
Pfam-B_8105
Pfam-B_1830Pfam-B_1831
Pfam-B_3055Pfam-B_10833
Pfam-B_1158
Pfam-B_1890
Pfam-B_9870
Pfam-B_3620
Pfam-B_9575
Pfam-B_1737
Pfam-B_7374
Pfam-B_3829Pfam-B_7329
Pfam-B_7378
Pfam-B_1133Pfam-B_573
Pfam-B_550
Pfam-B_5760
Pfam-B_8928
t h i o r e dPfam-B_9149
Pfam-B_3306
Pfam-B_2912
Pfam-B_8890Pfam-B_2675
Pfam-B_10099
Pfam-B_7653
Pfam-B_3935
Pfam-B_7654
Pfam-B_5079
lec t in_ legB
Pfam-B_1614
Pfam-B_5375
Pfam-B_1829Pfam-B_492Pfam-B_9274
Pfam-B_9264
Pfam-B_1469Pfam-B_3427
Pfam-B_1481
Pfam-B_1828
Pfam-B_4028
Pfam-B_1960
Pfam-B_4134
Pfam-B_7411
Pfam-B_1448
Pfam-B_10303
Pfam-B_6965Pfam-B_3741
Pfam-B_8594Pfam-B_7117
Pfam-B_1368
Pfam-B_5794
Pfam-B_7897
Pfam-B_8516
Pfam-B_166
Pfam-B_8515
Pfam-B_9047
Pfam-B_9042Pfam-B_9046
Pfam-B_2715z f -C2H2
Pfam-B_6963Pfam-B_6964Pfam-B_5959
Pfam-B_6484Pfam-B_6040
Pfam-B_7118
Pfam-B_11833
Pfam-B_9182
Pfam-B_4277
Pfam-B_7669
Pfam-B_9043
Pfam-B_1985
Pfam-B_6565
Pfam-B_4726
Pfam-B_9437
Pfam-B_5757
Pfam-B_5129
Pfam-B_9558
Pfam-B_9983
Pfam-B_1846
Pfam-B_5634
Pfam-B_5421
Pfam-B_9667
Pfam-B_3707
Pfam-B_2061
Pfam-B_2981
Pfam-B_3102
Pfam-B_8228
Pfam-B_2575
Pfam-B_3187Pfam-B_4244
Pfam-B_7010 Pfam-B_10357
Pfam-B_1617
Pfam-B_28
Pfam-B_864
Pfam-B_1136
Pfam-B_2855
Pfam-B_863
Pfam-B_1618
Pfam-B_1840
Pfam-B_7826
Pfam-B_8141
Pfam-B_8196
Pfam-B_1656
Pfam-B_3570
Pfam-B_6903
Pfam-B_3172
Pfam-B_2601
Pfam-B_7708
Pfam-B_3708
Pfam-B_9649
RIP
Pfam-B_514
Pfam-B_604
Pfam-B_7136
Pfam-B_4698
Pfam-B_6576
Pfam-B_842
Pfam-B_4686
Pfam-B_10698
Pfam-B_3259
Pfam-B_1328
Pfam-B_6572
Pfam-B_1088
Pfam-B_11292
Pfam-B_9731
Pfam-B_6029Pfam-B_811
Pfam-B_10699
Pfam-B_10692
Pfam-B_2105
Pfam-B_1554Pfam-B_2858
Pfam-B_6778
Pfam-B_1785 Pfam-B_4902
Pfam-B_2785
Pfam-B_5532
Pfam-B_564
Pfam-B_1359
Pfam-B_1806Pfam-B_10148 Pfam-B_4022Pfam-B_5945
Pfam-B_3453
Pfam-B_3597
Pfam-B_3605Pfam-B_1378
Pfam-B_4340
Pfam-B_691
Pfam-B_2095Pfam-B_808
Pfam-B_7784
Pfam-B_3246
GATase
Pfam-B_4962
Pfam-B_1179
Pfam-B_3064
Pfam-B_4400
Pfam-B_9537
Pfam-B_9523
Pfam-B_9358
Pfam-B_5934
Pfam-B_7619Pfam-B_1792
Pfam-B_3053
h o r m o n e 2
Pfam-B_7998
Pfam-B_198
Pfam-B_7413
Pfam-B_11448Pfam-B_710
Pfam-B_11468
Pfam-B_244
Pfam-B_2706
Pfam-B_2705
Pfam-B_1321
Pfam-B_3568
Pfam-B_3563
Pfam-B_8461
Pfam-B_2664Pfam-B_8071
Pfam-B_4069
Pfam-B_4207Pfam-B_3186
Pfam-B_642
Pfam-B_9957
Pfam-B_10710
Pfam-B_9194 Pfam-B_6876 Pfam-B_1827
Pfam-B_9259
Pfam-B_8730 Pfam-B_5898
Pfam-B_5439Pfam-B_11469
Pfam-B_2948Pfam-B_7389
Pfam-B_4941
Pfam-B_4338
Pfam-B_11264
Pfam-B_7144 Pfam-B_7724
Pfam-B_891
Pfam-B_1586
Pfam-B_2227Pfam-B_9408Pfam-B_7395
Pfam-B_1784
Pfam-B_10036Pfam-B_539Pfam-B_4856
Pfam-B_7738
Pfam-B_4574
Pfam-B_7006
Pfam-B_1102
Pfam-B_1151
UPAR_LY6 Pfam-B_947Pfam-B_9938
Pfam-B_2668
Pfam-B_387Pfam-B_2653
Pfam-B_543
Pfam-B_454
Pfam-B_29
Pfam-B_485
Pfam-B_266
Pfam-B_2154 Pfam-B_7122 Pfam-B_5103 Pfam-B_4812 Pfam-B_4790
Pfam-B_7277
Pfam-B_7353
Pfam-B_7125
Pfam-B_1005
Pfam-B_929
Pfam-B_406
Pfam-B_9345
Pfam-B_11361
Pfam-B_11365
Pfam-B_4437
Pfam-B_11404
Pfam-B_192
Pfam-B_5942
Pfam-B_4778
Pfam-B_6536
Pfam-B_4798
Pfam-B_9751
Pfam-B_1530
Pfam-B_1063
i n s
Pfam-B_9982
Pfam-B_2600
Pfam-B_9623
Pfam-B_8284
Pfam-B_3360
Pfam-B_102
Pfam-B_1713
Pfam-B_4057
Pfam-B_4807
Pfam-B_5908
Pfam-B_11809
i gp k i n a s e
Pfam-B_5909
Pfam-B_1525
Pfam-B_3960
Pfam-B_10822
Pfam-B_6657
Pfam-B_629
Pfam-B_1446
Pfam-B_33
Pfam-B_11098
Pfam-B_3240
Pfam-B_322
Pfam-B_1124
Pfam-B_5404Pfam-B_8175
Pfam-B_7727
Pfam-B_8195
Pfam-B_8161
Pfam-B_1855
Pfam-B_4051Pfam-B_6308
Pfam-B_7392
Pfam-B_3898
Pfam-B_5406
Pfam-B_2777
Pfam-B_2839
Pfam-B_1466
Pfam-B_6943
Pfam-B_7595
Pfam-B_5136
Pfam-B_8508
Pfam-B_10215
Pfam-B_7977
Pfam-B_11159Pfam-B_10419
Pfam-B_11029
Pfam-B_8360
Pfam-B_4107
Pfam-B_296
Pfam-B_741
Pfam-B_1848Pfam-B_294
Pfam-B_295
Pfam-B_6066
Pfam-B_4089
Pfam-B_3194
Pfam-B_9968
Pfam-B_2812
Pfam-B_1818
Pfam-B_4943
Pfam-B_2568
Pfam-B_4490
Pfam-B_8238
Pfam-B_5028
Pfam-B_5740
Pfam-B_7809
Pfam-B_2234
Pfam-B_453
Pfam-B_2611
Pfam-B_5714
Pfam-B_686
Pfam-B_7263
Pfam-B_10827
Pfam-B_818
Pfam-B_2763
Pfam-B_5718
Pfam-B_5208
Pfam-B_4150
Pfam-B_438
Pfam-B_5766 Pfam-B_6548
Pfam-B_5167
Pfam-B_2703Pfam-B_9916
Pfam-B_7107Pfam-B_375
Pfam-B_1931
Pfam-B_3952
Pfam-B_2659
Pfam-B_3373
Pfam-B_5101
Pfam-B_2697
Pfam-B_2696
Pfam-B_7008
Pfam-B_11199
Pfam-B_4552
Pfam-B_7830
Pfam-B_1157
Pfam-B_5041
Pfam-B_7052 Pfam-B_2802
Pfam-B_8808
Pfam-B_4451
Pfam-B_8193
HSP20
Pfam-B_3730
Pfam-B_1821
S H 2
Pfam-B_1883
Pfam-B_6414
Pfam-B_3618Pfam-B_11900
Pfam-B_8407
Pfam-B_2840
Pfam-B_2540
Pfam-B_4420Pfam-B_275
Pfam-B_3072
Pfam-B_1331
Pfam-B_8825
Pfam-B_2518
Pfam-B_2949
Pfam-B_1134
Pfam-B_7557
Pfam-B_6257
Pfam-B_3873Pfam-B_4700h o r m o n e
Pfam-B_6489
Pfam-B_735
Pfam-B_2016
Pfam-B_9650
Pfam-B_8145
Pfam-B_1025
Pfam-B_2827
Pfam-B_2161
Pfam-B_8760
Pfam-B_2506
Pfam-B_8944
Pfam-B_8138
Pfam-B_2701
Pfam-B_8445
Pfam-B_8987
Pfam-B_5187
Pfam-B_2002
Pfam-B_5348
Pfam-B_913
Pfam-B_10298
Pfam-B_6586
Pfam-B_2026
Pfam-B_267
Pfam-B_5316
Pfam-B_4
Pfam-B_3516
Pfam-B_466
Pfam-B_415Pfam-B_7462
Pfam-B_3877
Pfam-B_6858Pfam-B_8376Pfam-B_11590
Pfam-B_5085
Pfam-B_4139
Pfam-B_3557
Pfam-B_123
Pfam-B_595
Pfam-B_138
Pfam-B_395
Pfam-B_1204
Pfam-B_2892
Pfam-B_8323
Pfam-B_6619
Pfam-B_11001
Pfam-B_1510
Pfam-B_4713
Pfam-B_292Pfam-B_8126
Pfam-B_11737
Pfam-B_3137
Pfam-B_10918
Pfam-B_2148
Pfam-B_7200
Pfam-B_2340
Pfam-B_9546
Pfam-B_1895
Pfam-B_5272
Pfam-B_4664
Pfam-B_8019
Pfam-B_3633
Pfam-B_5089
Pfam-B_5214
Pfam-B_65
Pfam-B_6374
Pfam-B_1075
Pfam-B_121
Pfam-B_3942
Pfam-B_6098
HTH_1
Pfam-B_8157
Pfam-B_9605
Pfam-B_5730Pfam-B_1683
Pfam-B_1996
Pfam-B_9267Pfam-B_7560
Pfam-B_964Pfam-B_5571
Pfam-B_5048
Pfam-B_2006
Pfam-B_2877
Pfam-B_2878
Pfam-B_2669
Pfam-B_7836
Pfam-B_6523
Pfam-B_6959
Pfam-B_3878 Pfam-B_2012
Pfam-B_9494
Pfam-B_2013
Pfam-B_1490
Pfam-B_4749
Pfam-B_6604Pfam-B_10799
Pfam-B_706
Pfam-B_549Pfam-B_1520
Pfam-B_10802
Pfam-B_668
Pfam-B_1627Pfam-B_1449Pfam-B_800
Pfam-B_799Pfam-B_1285Pfam-B_4154
Pfam-B_1860
Pfam-B_1853Pfam-B_1454
Pfam-B_932
Pfam-B_9984
Pfam-B_2187
Pfam-B_5487
Pfam-B_11160
Pfam-B_9972Pfam-B_10956
Pfam-B_10955
Pfam-B_9973
Pfam-B_1863
Pfam-B_4102
Pfam-B_666
Pfam-B_5509
Pfam-B_5516
Pfam-B_8312
Pfam-B_8364
Pfam-B_5513
Pfam-B_8439Pfam-B_4141
Pfam-B_8419
Pfam-B_4138
Pfam-B_3138Pfam-B_4011
Pfam-B_8289
Pfam-B_1629
Pfam-B_4159
Pfam-B_2648Pfam-B_1333
Pfam-B_5471Pfam-B_5517Pfam-B_8493Pfam-B_3434
Pfam-B_926
Pfam-B_8368
Pfam-B_8326
Pfam-B_8305
Pfam-B_11491
Pfam-B_9702
Pfam-B_2950Pfam-B_849
adh_z inc
Pfam-B_576Pfam-B_347
Pfam-B_9624
Pfam-B_609
Pfam-B_8974
Pfam-B_3981
Pfam-B_7970
Pfam-B_8989
Pfam-B_8988
Pfam-B_4104
Pfam-B_7646
Pfam-B_9380
Pfam-B_9225
Pfam-B_6225
STphospha tase P fam-B_2126
Pfam-B_3197
Pfam-B_4547Pfam-B_7642Pfam-B_9980
Pfam-B_4410
Pfam-B_3467
Pfam-B_3933
Pfam-B_1123
Pfam-B_6227
Pfam-B_2445Pfam-B_6021
Pfam-B_1029
Pfam-B_1693
Pfam-B_9561
Pfam-B_4088
Pfam-B_9560
Pfam-B_2682
Pfam-B_6428Pfam-B_10374
Pfam-B_380
Pfam-B_442
Pfam-B_11331
Pfam-B_8802Pfam-B_8801
Pfam-B_748 Pfam-B_8986
Pfam-B_6023
Pfam-B_4852
Pfam-B_1512
Pfam-B_1286
Pfam-B_11324
Pfam-B_645
Pfam-B_59
Pfam-B_6724
Pfam-B_4087
Pfam-B_3002
Pfam-B_51
Pfam-B_8239
Pfam-B_10818
Pfam-B_4499
Pfam-B_10817
Pfam-B_11332
Pfam-B_10819
Pfam-B_10824
Pfam-B_1177
Pfam-B_194
Pfam-B_9971
Pfam-B_3196
Pfam-B_9981
Pfam-B_6544
Pfam-B_4240
Pfam-B_804
Pfam-B_1391
Pfam-B_8182
Pfam-B_4070
Pfam-B_1839
Pfam-B_9927
Pfam-B_2084
Pfam-B_2998
Pfam-B_3970Pfam-B_4475
Pfam-B_8241
Pfam-B_8245
Pfam-B_11846
Pfam-B_8249
Pfam-B_8242
Pfam-B_3969
Pfam-B_1216
Pfam-B_2408
Pfam-B_5444
w a p
Pfam-B_9942
Pfam-B_1326
Pfam-B_4813
Pfam-B_1060
Pfam-B_9527
Pfam-B_8206
Pfam-B_8205
Pfam-B_9418
Pfam-B_1508
Pfam-B_455
Pfam-B_1622
Pfam-B_9903
Pfam-B_456
Pfam-B_5386 Pfam-B_2278
Pfam-B_390
Pfam-B_10545
Pfam-B_8912
Pfam-B_3195
v w a
Pfam-B_2178
Pfam-B_2756
f n 3Pfam-B_9563
Pfam-B_688
Pfam-B_10826
Pfam-B_687
f n 2
Pfam-B_7182
Pfam-B_3013
Pfam-B_3436
Pfam-B_6062
Pfam-B_3369
Pfam-B_461
Pfam-B_8230Pfam-B_4806Pfam-B_8527
Pfam-B_1436Pfam-B_6949Pfam-B_5515
Pfam-B_5742
c y c l i nPfam-B_8142Pfam-B_700
Pfam-B_3257Pfam-B_8017
Pfam-B_3663
Pfam-B_83
Pfam-B_5278
Pfam-B_1979Pfam-B_2241
Pfam-B_8139
Pfam-B_2598
Pfam-B_4062
Pfam-B_8143
Pfam-B_8003Pfam-B_8918
Pfam-B_5741
Pfam-B_6600
Pfam-B_8975
Pfam-B_5754
Pfam-B_477Pfam-B_1589
Pfam-B_4059
Pfam-B_219Pfam-B_4607
Pfam-B_11830l i p o c a l i n
Pfam-B_3082
Pfam-B_9309
Pfam-B_11241
Pfam-B_3706
Pfam-B_11743
Pfam-B_381
Pfam-B_4610
Pfam-B_3071Pfam-B_6563
Pfam-B_4075Pfam-B_4072
Pfam-B_4538
Pfam-B_3916
Pfam-B_534Pfam-B_2122Pfam-B_4060
Pfam-B_3817
Pfam-B_7442
Pfam-B_11808
Pfam-B_11323Pfam-B_10021
Pfam-B_3580
Pfam-B_5604
Pfam-B_699
Pfam-B_1465
Pfam-B_2716
Pfam-B_2775
Pfam-B_5799
Pfam-B_135
Pfam-B_4076
Pfam-B_4513
Pfam-B_5398
Pfam-B_5986Pfam-B_4709
Pfam-B_3897
Pfam-B_6144
Pfam-B_93
Pfam-B_2747
Pfam-B_3882
Pfam-B_5857
Pfam-B_4058
Pfam-B_7049
Pfam-B_554
Pfam-B_733
Pfam-B_4045
Pfam-B_11112
Pfam-B_407
myos in_head
Pfam-B_5154Pfam-B_8824
Pfam-B_4202Pfam-B_8207Pfam-B_9609Pfam-B_10702
Pfam-B_8197Pfam-B_2519
Pfam-B_8192
n o t c h
Pfam-B_3080
Pfam-B_3713
Pfam-B_1807
Pfam-B_11017Pfam-B_11000Pfam-B_6944Pfam-B_11009Pfam-B_11037
Pfam-B_2982Pfam-B_2983Pfam-B_11794
Pfam-B_2980
Pfam-B_4191 Pfam-B_752Pfam-B_747 Pfam-B_7116
Pfam-B_3066Pfam-B_11330 Pfam-B_10320Pfam-B_10305Pfam-B_822Pfam-B_1104Pfam-B_6706 Pfam-B_4967 Pfam-B_10120Pfam-B_11259
Pfam-B_4763 Pfam-B_4759 Pfam-B_4581Pfam-B_4597Pfam-B_4787 Pfam-B_3812Pfam-B_3838Pfam-B_3922Pfam-B_3923Pfam-B_3993Pfam-B_4413Pfam-B_4512Pfam-B_4528 Pfam-B_3043Pfam-B_3085Pfam-B_3132Pfam-B_3679 Pfam-B_3073Pfam-B_3362Pfam-B_3749 Pfam-B_3171Pfam-B_3190 Pfam-B_3093
Pfam-B_1366Pfam-B_3178 Pfam-B_1606Pfam-B_2670
Pfam-B_6201p e r o x i d a s e
Pfam-B_2704
Pfam-B_1163 Pfam-B_1292 Pfam-B_5678
Pfam-B_2690
Pfam-B_1545Pfam-B_146
Pfam-B_3368 Pfam-B_2602 Pfam-B_4433 Pfam-B_1300Pfam-B_2222 Pfam-B_4848Pfam-B_4838 Pfam-B_1838Pfam-B_3447Pfam-B_4435
Pfam-B_4660
Pfam-B_3174Pfam-B_9656
Pfam-B_2597Pfam-B_264
Pfam-B_366Pfam-B_1929Pfam-B_3275 Pfam-B_3100Pfam-B_482
Pfam-B_1513 Pfam-B_6256Pfam-B_1342 Pfam-B_155Pfam-B_4530 Pfam-B_128 Pfam-B_11497Pfam-B_11498Pfam-B_11545Pfam-B_4786Pfam-B_11579Pfam-B_11578Pfam-B_11795Pfam-B_11796 Pfam-B_525Pfam-B_11362Pfam-B_6787Pfam-B_11443 Pfam-B_6645Pfam-B_6789 Pfam-B_6788 Pfam-B_11351Pfam-B_11440Pfam-B_11442 Pfam-B_11363
Pfam-B_6060 Pfam-B_8848 Pfam-B_8840 Pfam-B_4849 Pfam-B_5274 Pfam-B_11095Pfam-B_4839 Pfam-B_409 Pfam-B_2474 Pfam-B_2709 Pfam-B_1367
Pfam-B_726Pfam-B_8878 COeste rase
Pfam-B_9985
Pfam-B_11889
Pfam-B_8883
Pfam-B_6228
Pfam-B_3561Pfam-B_4697
Pfam-B_4040
g l u t s
Pfam-B_8913
Pfam-B_11253
Pfam-B_9946
b e t a - l a c t a m a s e
P fam-B_2557
Pfam-B_1453
Pfam-B_486
Pfam-B_4195
Pfam-B_18
Pfam-B_8707
Pfam-B_27
Pfam-B_2175
Pfam-B_945Pfam-B_140 Pfam-B_139
Pfam-B_9989
Pfam-B_5170
Pfam-B_11774
Pfam-B_2480
Pfam-B_3789
Pfam-B_2404
Pfam-B_2373
Pfam-B_7628
Pfam-B_7212
Pfam-B_3907
Pfam-B_2556
Pfam-B_3908
Pfam-B_5151 Pfam-B_7726
Pfam-B_7725
Pfam-B_3709Pfam-B_2082
Pfam-B_1362
Pfam-B_1568
Pfam-B_9238
Pfam-B_2014
k a z a l
P fam-B_2273
Pfam-B_2768
Pfam-B_11
Pfam-B_3279
Pfam-B_14
Pfam-B_6512
Pfam-B_3566Pfam-B_6570
Pfam-B_6571
Pfam-B_2398Pfam-B_1735
Pfam-B_709
Pfam-B_39
Pfam-B_72Pfam-B_101
Pfam-B_1100
Pfam-B_2942 Pfam-B_404Pfam-B_11415
Pfam-B_11414
Pfam-B_197
Pfam-B_172
Pfam-B_848
c p n 6 0
Pfam-B_1001
Pfam-B_203
Pfam-B_11412
Pfam-B_9646
Pfam-B_383
Pfam-B_1046Pfam-B_208
Pfam-B_4789
Pfam-B_6773
Pfam-B_69
Pfam-B_6057
Pfam-B_1549
Pfam-B_10741
Pfam-B_925
Pfam-B_241
Pfam-B_6129
Pfam-B_9811
Pfam-B_249
Pfam-B_1558
Pfam-B_6128
Pfam-B_2442
Pfam-B_965
Pfam-B_9382
Pfam-B_7068
Pfam-B_9383
Pfam-B_7795
Pfam-B_3087
Pfam-B_1212
Pfam-B_1213
Pfam-B_3925 Pfam-B_2276
Pfam-B_10253
Pfam-B_131
Pfam-B_1224
Pfam-B_7132
Pfam-B_776
Pfam-B_5746
Pfam-B_3523
Pfam-B_3426
Pfam-B_268
Pfam-B_1090
Pfam-B_6347
Pfam-B_1973
Pfam-B_10157
Pfam-B_5926Pfam-B_4561 b Z I P
Pfam-B_1155
Pfam-B_7632Pfam-B_163
Pfam-B_769
Pfam-B_164
Pfam-B_3090
Pfam-B_1254
Pfam-B_242
Pfam-B_151
Pfam-B_342
Pfam-B_1581
Pfam-B_1791
Pfam-B_3906
Pfam-B_7584Pfam-B_9339 Pfam-B_7583 Pfam-B_5189
Pfam-B_1426
Pfam-B_1559Pfam-B_444
Pfam-B_2718
Pfam-B_2246
Pfam-B_290 Pfam-B_1432
Pfam-B_3
Pfam-B_2093
Pfam-B_3977
Pfam-B_8418Pfam-B_9423
Pfam-B_276
Pfam-B_259
Pfam-B_7956
Pfam-B_3790
Pfam-B_5171
Pfam-B_2737Pfam-B_9327
Pfam-B_4354
Pfam-B_205
Pfam-B_2514
Pfam-B_1404
Pfam-B_3028
Pfam-B_938
Pfam-B_4374
Pfam-B_256
Pfam-B_994
Pfam-B_5006
Pfam-B_3787
Pfam-B_719
Pfam-B_6627
Pfam-B_720
Pfam-B_7585Pfam-B_7851
Pfam-B_274Pfam-B_6993Pfam-B_8711 Pfam-B_7586
Pfam-B_3645
Pfam-B_11322Pfam-B_9421
Pfam-B_4767
Pfam-B_11321Pfam-B_1066
Pfam-B_3423
Pfam-B_6045
Pfam-B_507
Pfam-B_11671
w n t
P fam-B_11320
Pfam-B_9818
Pfam-B_591
Pfam-B_966
Pfam-B_9817
Pfam-B_9822
Pfam-B_9821
Pfam-B_9820
Pfam-B_9819
Pfam-B_6298
Pfam-B_10166
Pfam-B_4560
Pfam-B_4472
Pfam-B_4155
Pfam-B_958Pfam-B_9227Pfam-B_9232
Pfam-B_9230
Pfam-B_4157
Pfam-B_4156
Pfam-B_8507
Pfam-B_10161
Pfam-B_6601
Pfam-B_3949Pfam-B_7722
Pfam-B_343
Pfam-B_206Pfam-B_180
Pfam-B_11670
Pfam-B_4616
Pfam-B_3693
Pfam-B_10164
Pfam-B_10167
Pfam-B_6849
Pfam-B_419
Pfam-B_1199
Pfam-B_2365Pfam-B_333Pfam-B_7796
Pfam-B_10163
Pfam-B_5225
ox ido red_ fad
Pfam-B_1200
Pfam-B_1972
Pfam-B_10159
Pfam-B_10169
Pfam-B_503
Pfam-B_10168
Pfam-B_418
Pfam-B_7736
Pfam-B_3956
Pfam-B_9347
Pfam-B_3950 Pfam-B_3951
Pfam-B_5215
Pfam-B_4227 Pfam-B_5399 Pfam-B_2596 Pfam-B_7344
Pfam-B_5400
Pfam-B_9288
MHC_I
Pfam-B_5923
Pfam-B_6362
Pfam-B_5216
Pfam-B_6491
Pfam-B_9231
Pfam-B_10162
Pfam-B_10160
Pfam-B_6490a lpha -amy lase
P fam-B_6488Pfam-B_10506
Pfam-B_273
Pfam-B_1121
Pfam-B_476
Pfam-B_2521
Pfam-B_5217
Pfam-B_179
Pfam-B_3948
Pfam-B_9668
Pfam-B_9675
Pfam-B_6346
Pfam-B_7721
Pfam-B_7723
Pfam-B_9674Pfam-B_1092
Pfam-B_3768
Pfam-B_2769
Pfam-B_7160 Pfam-B_3582
Pfam-B_10179
Pfam-B_6722
Pfam-B_9698
Pfam-B_6608
Pfam-B_8103
Pfam-B_7376
Pfam-B_8094
Pfam-B_5077
Pfam-B_10566Pfam-B_3165
Pfam-B_7375
Pfam-B_11771
Pfam-B_4532Pfam-B_6667Pfam-B_2764
Pfam-B_6923Pfam-B_7111
Pfam-B_10082
Pfam-B_3587
Pfam-B_6671
Pfam-B_4684Pfam-B_6847
Pfam-B_312
Pfam-B_11584
Pfam-B_990
Pfam-B_7322Pfam-B_7321
Pfam-B_1602
Pfam-B_4357
Pfam-B_8578
Pfam-B_4922Pfam-B_2992
Pfam-B_865Pfam-B_885Pfam-B_3953
Pfam-B_1556
Pfam-B_4225
Pfam-B_2515
Pfam-B_725
Pfam-B_7648
Pfam-B_5168
Pfam-B_3611
Pfam-B_7712
Pfam-B_11104
7 t m _ 1
Pfam-B_989Pfam-B_2504
Pfam-B_7713
Pfam-B_11644
Pfam-B_3310
Pfam-B_4258
Pfam-B_2409
Pfam-B_1788
Pfam-B_4919
Pfam-B_4920
Pfam-B_2479
Pfam-B_4688
Pfam-B_7647
Pfam-B_1112
Pfam-B_11419
Pfam-B_341Pfam-B_313
Pfam-B_6924
Pfam-B_5999
Pfam-B_4356
Pfam-B_10756
Pfam-B_1726Pfam-B_1720
Pfam-B_2060Pfam-B_1612
Pfam-B_10946
Pfam-B_8518
Pfam-B_1238
Pfam-B_10884
Pfam-B_7069
Pfam-B_2913
S H 3
Pfam-B_2008
Pfam-B_4924
Pfam-B_6595Pfam-B_6594
Pfam-B_5656
Pfam-B_9431Pfam-B_9750
Pfam-B_11593
Pfam-B_2228Pfam-B_3887
Pfam-B_1725
Pfam-B_1956
Pfam-B_974Pfam-B_8227Pfam-B_1255Pfam-B_7456
Pfam-B_6909
Pfam-B_7635
Pfam-B_4648Pfam-B_7797
Pfam-B_9911
Pfam-B_770
Pfam-B_11714Pfam-B_9904Pfam-B_9897
Pfam-B_11240
Pfam-B_7009
Pfam-B_11717
Pfam-B_2223
Pfam-B_888Pfam-B_9052
A A APfam-B_11592
Pfam-B_10010Pfam-B_11859
Pfam-B_7798
Pfam-B_1684Pfam-B_2425 Pfam-B_796
Pfam-B_3289 Pfam-B_2150
Pfam-B_403Pfam-B_878Pfam-B_11735
Pfam-B_11918Pfam-B_9590 Pfam-B_7971
Pfam-B_3458Pfam-B_3405Pfam-B_10965
Pfam-B_2157
Pfam-B_2034
Pfam-B_3902Pfam-B_5313
Pfam-B_3133
Pfam-B_7299
Pfam-B_2887
Pfam-B_7300
Pfam-B_8153
Pfam-B_5397Pfam-B_7951
Pfam-B_225Pfam-B_8603
Pfam-B_5607
Pfam-B_11766
Pfam-B_7926
Pfam-B_7844
Pfam-B_7561
Pfam-B_1826
Pfam-B_7210
Pfam-B_7994Pfam-B_2
Pfam-B_2977
lamin in_G
Pfam-B_7209
Pfam-B_9506Pfam-B_1047
z n - p r o t e a s e
Pfam-B_1016
Pfam-B_4077
Pfam-B_887
Pfam-B_8257
Pfam-B_10925Pfam-B_3771Pfam-B_7053
Pfam-B_7545
Pfam-B_3015
Pfam-B_2314
Pfam-B_2483
Pfam-B_2494
Pfam-B_7334
Pfam-B_11026Pfam-B_2570
Cys_knot
Pfam-B_32
Pfam-B_3946laminin_EGF
Pfam-B_606 Pfam-B_7719Pfam-B_7717
Pfam-B_2963
Pfam-B_1459
Pfam-B_7718
Pfam-B_7535
Pfam-B_4252
Pfam-B_523Pfam-B_436
Pfam-B_8583
Pfam-B_479
Pfam-B_8194
Pfam-B_8170Pfam-B_3169
Pfam-B_3124
Pfam-B_764
Pfam-B_6716
Pfam-B_843Pfam-B_2635Pfam-B_309
s u s h i
EGF
Pfam-B_6659
Pfam-B_844
Pfam-B_5809
Pfam-B_851
Pfam-B_8397
Pfam-B_6641
Pfam-B_10350
Pfam-B_8222
P H
Pfam-B_7252
Pfam-B_10217 Pfam-B_547
Pfam-B_548
Pfam-B_616
Pfam-B_3017
Pfam-B_1113
Pfam-B_8910 Pfam-B_5360
l a m i n i n _ N t e r m
Pfam-B_1189
Pfam-B_4083
Pfam-B_4093
Pfam-B_4036
Pfam-B_3971
Pfam-B_8223
Pfam-B_2608
Pfam-B_4084
Pfam-B_6122Pfam-B_9008
Pfam-B_2710
Pfam-B_1654
Pfam-B_472Pfam-B_3012
Pfam-B_520
Pfam-B_2386Pfam-B_6076
Pfam-B_473
Pfam-B_2583
Pfam-B_105
Pfam-B_308
Pfam-B_7982 Pfam-B_334
Pfam-B_352
Pfam-B_330Pfam-B_813
Pfam-B_575
Pfam-B_3109Pfam-B_4701
Pfam-B_3302
Pfam-B_1774
lec t in_c
Pfam-B_7677
Pfam-B_1967
Pfam-B_10703
Pfam-B_7745
Pfam-B_1068
Pfam-B_7952
Pfam-B_8982Pfam-B_1283
Pfam-B_9191
Pfam-B_711
Pfam-B_4085
Pfam-B_10707
Pfam-B_7558
Pfam-B_3899
Pfam-B_7559Pfam-B_2225
Pfam-B_2541
Pfam-B_9407
Pfam-B_3192
Pfam-B_11198
Pfam-B_4044
Pfam-B_3429
Pfam-B_3170
Pfam-B_376
Pfam-B_7077
Pfam-B_7078
Pfam-B_4851Pfam-B_936
Pfam-B_3442
Pfam-B_5765
Pfam-B_1125
Pfam-B_8758
Pfam-B_1832
Pfam-B_11637
Pfam-B_3686
Pfam-B_11658
Pfam-B_11657Pfam-B_2961
Pfam-B_3689
Pfam-B_3586
Pfam-B_2546
Pfam-B_7526
Pfam-B_3962
Pfam-B_8090
Pfam-B_5184
Pfam-B_5185
Pfam-B_4148
Pfam-B_5128
Pfam-B_5176
Pfam-B_11164
Pfam-B_10478Pfam-B_1498
Pfam-B_2789
Pfam-B_2333
Pfam-B_3549
Pfam-B_2125
Pfam-B_9918
Pfam-B_4164
Pfam-B_5820
Pfam-B_3940
Pfam-B_3288
Pfam-B_6183
Pfam-B_6184
Pfam-B_7750
Pfam-B_130
ld l_recept_a
Pfam-B_5141
Pfam-B_4074
Pfam-B_7311
Pfam-B_4174
Pfam-B_1026
Pfam-B_7425
Pfam-B_4550
Pfam-B_305
Pfam-B_9508
Pfam-B_1868
Pfam-B_5334
Pfam-B_338
Pfam-B_1064 Pfam-B_1815Pfam-B_1816
Pfam-B_1819
Pfam-B_1817
Pfam-B_156
Pfam-B_546
Pfam-B_4563
Pfam-B_3773
Pfam-B_1621Pfam-B_1620
Pfam-B_1467Pfam-B_3816
Pfam-B_8181
Pfam-B_3509
Pfam-B_968Pfam-B_3200
Pfam-B_8224
Pfam-B_10440
Pfam-B_7848
Pfam-B_7034
Pfam-B_9943
Pfam-B_4617
Pfam-B_7843
Pfam-B_1267
Pfam-B_961Pfam-B_11905Pfam-B_2284
Pfam-B_1813
Pfam-B_2285Pfam-B_5246
C 2Pfam-B_11892
Pfam-B_2604
Pfam-B_1532
Pfam-B_4714
Pfam-B_2766
Pfam-B_11500Pfam-B_2994
Pfam-B_4295
Pfam-B_3075
Pfam-B_5407
Pfam-B_1291
Pfam-B_5405
Pfam-B_3193
Pfam-B_5710Pfam-B_5709
Pfam-B_4237f n 1 Pfam-B_9901
Pfam-B_2797
Pfam-B_1823Pfam-B_7923
Pfam-B_2035
Pfam-B_1473
Pfam-B_5264
Pfam-B_1265
Pfam-B_7573
Pfam-B_5142
Pfam-B_5186
Pfam-B_5144
Pfam-B_940
Pfam-B_1600
Pfam-B_5591
Pfam-B_3944
Pfam-B_7282
i l 8
Pfam-B_437
Pfam-B_1601
Pfam-B_9129
Pfam-B_3315
Pfam-B_7749
Pfam-B_9130
Pfam-B_8089t o x i n
Pfam-B_3963
Pfam-B_2547
Pfam-B_4917
Pfam-B_11638
Pfam-B_1388
Pfam-B_6547
Pfam-B_11643
Pfam-B_3256
Pfam-B_3095
Pfam-B_11625
Pfam-B_9919
Pfam-B_623
Pfam-B_5190
Pfam-B_5191
Pfam-B_5192
s i g m a 7 0
Pfam-B_4137
Pfam-B_4995
Pfam-B_10145
Pfam-B_8676
Pfam-B_11279
Pfam-B_11151
Pfam-B_11046
Pfam-B_11014
Pfam-B_5335
Pfam-B_10130
Pfam-B_10105
Pfam-B_4555Pfam-B_4556
Pfam-B_460
Pfam-B_6291
Pfam-B_698
Pfam-B_1970
Pfam-B_10142
Pfam-B_459
Pfam-B_787
Pfam-B_9859
Pfam-B_10555
p 4 5 0
Pfam-B_4662
Pfam-B_1237
Pfam-B_1676Pfam-B_1682
Pfam-B_3393
Pfam-B_1569
Pfam-B_118
Pfam-B_10476
Pfam-B_288
Pfam-B_2742
Pfam-B_9385
Pfam-B_10391
Pfam-B_317
Pfam-B_1583
Pfam-B_34
Pfam-B_7259
Pfam-B_552
Pfam-B_10852
Pfam-B_4653
pro_ isomerase
Pfam-B_793
Pfam-B_3469
Pfam-B_6233
Pfam-B_9876
Pfam-B_2803
Pfam-B_4186
Pfam-B_10356
ox ido red_n i t r o
P fam-B_9979
Pfam-B_10073
Pfam-B_1167
Pfam-B_722
Pfam-B_3067
Pfam-B_9964
Pfam-B_2455
Pfam-B_1348
Pfam-B_5652
Pfam-B_2800
Pfam-B_7476
Pfam-B_154
Pfam-B_6674
Pfam-B_2058
Pfam-B_10668
Pfam-B_1365
Pfam-B_10807
Pfam-B_2823
Pfam-B_9861
Pfam-B_4689
Pfam-B_10202
Pfam-B_5062
Pfam-B_209
Pfam-B_6475
Pfam-B_10477
Pfam-B_1281
Pfam-B_271
Pfam-B_10479
Pfam-B_397
Pfam-B_1997
Pfam-B_3530
Pfam-B_4652
Pfam-B_1619
Pfam-B_10140
Pfam-B_7511
Pfam-B_1523
Pfam-B_9317
Pfam-B_2252
Pfam-B_4615
Pfam-B_9588
Pfam-B_1609
Pfam-B_7418
Pfam-B_5234
Pfam-B_2140
Pfam-B_11922
Pfam-B_1338
Pfam-B_10433
Pfam-B_3874
f e r 2
Pfam-B_7023
Pfam-B_9219
Pfam-B_5207
Pfam-B_3482
Pfam-B_9220Pfam-B_2698
Pfam-B_8696
Pfam-B_11927
Pfam-B_7022Pfam-B_4921
Pfam-B_4918
Pfam-B_4149
Pfam-B_8759
Pfam-B_9331
Pfam-B_7616
Pfam-B_3294Pfam-B_3295
Pfam-B_4350
Pfam-B_4536
Pfam-B_5175
Pfam-B_2618
Pfam-B_3945
Pfam-B_8581
Pfam-B_11642
Pfam-B_2619
Pfam-B_1665
Pfam-B_8580
Pfam-B_4829
Pfam-B_4916
Pfam-B_4847
Pfam-B_5143
Pfam-B_3939
Pfam-B_6452
Pfam-B_1264
Pfam-B_5789
Pfam-B_11751Pfam-B_2101
n e u r
Pfam-B_6171
Pfam-B_9310
Pfam-B_4533
Pfam-B_4535
Pfam-B_10270
Pfam-B_820
Pfam-B_7384
Pfam-B_5161Pfam-B_5590
Pfam-B_528
Pfam-B_5589
Pfam-B_2191
Pfam-B_8471
a m i n o t r a n
Pfam-B_5971
Pfam-B_5721
Pfam-B_8619
Pfam-B_8584
Pfam-B_920
Pfam-B_1071
Pfam-B_4832
Pfam-B_3821Pfam-B_3031
Pfam-B_9000
Pfam-B_223
Pfam-B_7459
Pfam-B_430
Pfam-B_916
Pfam-B_9145
Pfam-B_1447
Pfam-B_1949
Pfam-B_809
Pfam-B_3041
Pfam-B_6542
Pfam-B_988
Pfam-B_5972
Pfam-B_9648
Pfam-B_4418
e f h a n d
Pfam-B_6053
Pfam-B_5841Pfam-B_9138
Pfam-B_2324
Pfam-B_4632
Pfam-B_332
Pfam-B_9838
Pfam-B_1433
Pfam-B_3153
Pfam-B_5383
Pfam-B_5842
Pfam-B_3446
Pfam-B_4482
Pfam-B_6146
Pfam-B_3371
Pfam-B_2305
Pfam-B_147
Pfam-B_7055
Pfam-B_6038
Pfam-B_10814Pfam-B_963
Pfam-B_7399
Pfam-B_282
Pfam-B_1927
Pfam-B_3881
Pfam-B_1591Pfam-B_723
Pfam-B_361
Pfam-B_388
Pfam-B_222
h i s t o n e
Pfam-B_7501
Pfam-B_408
Pfam-B_3738
Pfam-B_7577
Pfam-B_626
Pfam-B_5385
Pfam-B_8057
Pfam-B_8095
Pfam-B_483
Pfam-B_6248
Pfam-B_5138
Pfam-B_3404
Pfam-B_4913
Pfam-B_7904
Pfam-B_3117
Pfam-B_7041
Pfam-B_791
Pfam-B_2755
Pfam-B_10144Pfam-B_2361
Pfam-B_3494
Pfam-B_1592Pfam-B_10143
Pfam-B_1971
Pfam-B_10131
Pfam-B_3342
Pfam-B_6294
Pfam-B_9139
Pfam-B_2790
Pfam-B_9140
Pfam-B_3543
Pfam-B_590
Pfam-B_1941
E1-E2_ATPase
Pfam-B_697
Pfam-B_2120
Pfam-B_1516
Pfam-B_92
Pfam-B_6467
Pfam-B_6042
Pfam-B_4453Pfam-B_1183
Figure 5.11. The Family network (using both CPs and non-CPs)
Red nodes are nodes found in the non-CP network. Green nodes are nodes found only in theCP-network. Blue edges denote edges found in the CP network. Pink edges are edges foundonly in the CP network.
CHAPTER 5. CIRCULAR PATTERN MATCHING 131
Pfam-B_2770Pfam-B_11365
Pfam-B_4777
Pfam-B_5003 Pfam-B_11746
Pfam-B_9321Cys -p ro tease
Pfam-B_2738
Pfam-B_6605
Pfam-B_8968a c t i nPfam-B_111
Pfam-B_2879
Pfam-B_2396
Pfam-B_1363
Pfam-B_2645
Pfam-B_10658
Pfam-B_3568
Pfam-B_2705
Pfam-B_3789
Pfam-B_5945Pfam-B_2069
Pfam-B_4679
Pfam-B_4685Pfam-B_9238
Pfam-B_3790
k a z a l P fam-B_7340Pfam-B_6029 Pfam-B_7207 Pfam-B_6289
Pfam-B_11747Pfam-B_3355
Pfam-B_2456
Pfam-B_2833
Pfam-B_8452 Pfam-B_3283
Pfam-B_5664
Pfam-B_5662
Pfam-B_631
Pfam-B_2426
Pfam-B_8453
Pfam-B_3918
Pfam-B_161Pfam-B_1981Pfam-B_652
Pfam-B_2975 Pfam-B_9320
Pfam-B_3920
Pfam-B_8451Pfam-B_3562
Pfam-B_8065
Pfam-B_2432
Pfam-B_11478Pfam-B_11476
Pfam-B_9989
Pfam-B_1568
Pfam-B_1178Pfam-B_6450
Pfam-B_124Pfam-B_3306 Pfam-B_11893
Pfam-B_11894Pfam-B_1317
Pfam-B_6045Pfam-B_5944Pfam-B_3576Pfam-B_7212Pfam-B_2706Pfam-B_2427Pfam-B_5663Pfam-B_628
Pfam-B_9344
Pfam-B_3728 Pfam-B_6349 Pfam-B_4704Pfam-B_5222Pfam-B_6317 Pfam-B_356Pfam-B_5760
Pfam-B_2959
Pfam-B_11367
Pfam-B_1890Pfam-B_4798
Pfam-B_10660Pfam-B_4670 Pfam-B_10659
Pfam-B_3563
Pfam-B_780
Pfam-B_2005
Pfam-B_1530
Pfam-B_7956 Pfam-B_2411
Pfam-B_10663
Pfam-B_1066
Pfam-B_8969
Pfam-B_10644
Pfam-B_8973 Pfam-B_2003Pfam-B_9346 Pfam-B_3560
Pfam-B_2526
Pfam-B_3053
Pfam-B_3054
Pfam-B_5104
Pfam-B_9212
Pfam-B_9210
Pfam-B_1312
Pfam-B_683
Pfam-B_9211Pfam-B_3350
Pfam-B_5258
Pfam-B_2507
Pfam-B_1737
Pfam-B_5257Pfam-B_1115
Pfam-B_3820Pfam-B_7818
Pfam-B_775
Pfam-B_3031
Pfam-B_3821Pfam-B_2769
Pfam-B_1221
Pfam-B_1092Pfam-B_3582
Pfam-B_10717
Pfam-B_2023
Pfam-B_3423
Pfam-B_1359
Pfam-B_10179
Pfam-B_10716
Pfam-B_1190
Pfam-B_695
Pfam-B_9873
Pfam-B_9876
Pfam-B_9880
Pfam-B_1507
Pfam-B_679Pfam-B_1463Pfam-B_4207Pfam-B_1864Pfam-B_1806Pfam-B_4266
Pfam-B_584Pfam-B_7540
Pfam-B_7512
Pfam-B_2523
Pfam-B_8708Pfam-B_1653
Pfam-B_11468
Pfam-B_610
Pfam-B_793Pfam-B_873
Pfam-B_1588Pfam-B_11922Pfam-B_7002
Pfam-B_1991Pfam-B_3780
Pfam-B_920
Pfam-B_1322Pfam-B_3366Pfam-B_9986 Pfam-B_8844 Pfam-B_7857 Pfam-B_4574Pfam-B_5881
Pfam-B_2991Pfam-B_2121Pfam-B_3858Pfam-B_2673Pfam-B_1132Pfam-B_11448Pfam-B_1565Pfam-B_3538
Pfam-B_7834Pfam-B_1371
Pfam-B_7112Pfam-B_4481 Pfam-B_3260Pfam-B_871
Pfam-B_277
Pfam-B_5651Pfam-B_4209
Pfam-B_2973
Pfam-B_2674
Pfam-B_1461
Pfam-B_1131
Pfam-B_345
Pfam-B_2813 Pfam-B_1321Pfam-B_230Pfam-B_2256
Pfam-B_1264 Pfam-B_115
Pfam-B_6502
Pfam-B_529
Pfam-B_3211 Pfam-B_10739
Pfam-B_3597
Pfam-B_8489
Pfam-B_5119 Pfam-B_7975 Pfam-B_1191 Pfam-B_7452Pfam-B_1103
Pfam-B_8049Pfam-B_4856
Pfam-B_3594
Pfam-B_710Pfam-B_3735Pfam-B_981g l n - s y n tP fam-B_3365Pfam-B_6278
Pfam-B_2301
Pfam-B_6539
Pfam-B_4884
Pfam-B_621
Pfam-B_2088 Pfam-B_2029 Pfam-B_11366 Pfam-B_10706 Pfam-B_1287 Pfam-B_2001 Pfam-B_6872 Pfam-B_639
Pfam-B_1095 Pfam-B_5558 UPAR_LY6
Pfam-B_4454
Pfam-B_570
Pfam-B_995 Pfam-B_3600
Pfam-B_1413Pfam-B_2422
Pfam-B_11330Pfam-B_6706Pfam-B_11259Pfam-B_8913Pfam-B_6788Pfam-B_11512Pfam-B_11545Pfam-B_4786Pfam-B_6865Pfam-B_11613Pfam-B_11745 Pfam-B_11442Pfam-B_6813 Pfam-B_3066Pfam-B_2921Pfam-B_11854Pfam-B_11850Pfam-B_11873Pfam-B_11880Pfam-B_11903Pfam-B_2475Pfam-B_1193Pfam-B_1366Pfam-B_6160Pfam-B_1262Pfam-B_11306Pfam-B_1929Pfam-B_1432Pfam-B_486 Pfam-B_1559 Pfam-B_1367Pfam-B_3977 COeste rasePfam-B_1426Pfam-B_1453Pfam-B_223Pfam-B_689Pfam-B_1923Pfam-B_6768Pfam-B_1754 Pfam-B_1741Pfam-B_5566Pfam-B_171Pfam-B_320Pfam-B_9804 h i s t o n ePfam-B_1183Pfam-B_10268Pfam-B_462Pfam-B_10305Pfam-B_10320Pfam-B_1104Pfam-B_4967Pfam-B_11253
Pfam-B_4144 Pfam-B_3923 Pfam-B_3922 Pfam-B_3838 Pfam-B_3812Pfam-B_3872 Pfam-B_3749 Pfam-B_2559Pfam-B_2650Pfam-B_2657Pfam-B_2954Pfam-B_3010Pfam-B_3043Pfam-B_3073 Pfam-B_2495 Pfam-B_2338 Pfam-B_2199 Pfam-B_2184 Pfam-B_2171 Pfam-B_2068Pfam-B_2287Pfam-B_2369 Pfam-B_1762 Pfam-B_1749 Pfam-B_11232Pfam-B_11302Pfam-B_11518Pfam-B_11601Pfam-B_11623Pfam-B_11663Pfam-B_11779Pfam-B_1205 Pfam-B_11694 Pfam-B_11194 Pfam-B_10849 Pfam-B_1054 Pfam-B_1045 g p d hPfam-B_10954Pfam-B_11127Pfam-B_10962
Pfam-B_9731Pfam-B_2785Pfam-B_9863Pfam-B_9979S 4 Pfam-B_7832Pfam-B_8028Pfam-B_8356Pfam-B_2873Pfam-B_2690Pfam-B_9288Pfam-B_9408Pfam-B_1918 Pfam-B_264 Pfam-B_8932Pfam-B_9410Pfam-B_1566Pfam-B_713Pfam-B_10902Pfam-B_11447Pfam-B_6790Pfam-B_11446Pfam-B_9612
Pfam-B_1619 Pfam-B_1609 Pfam-B_1548 Pfam-B_1522 Pfam-B_128Pfam-B_3679 Pfam-B_3362 Pfam-B_3218 Pfam-B_3190 Pfam-B_3171 Pfam-B_3093 Pfam-B_3085
Pfam-B_7494Pfam-B_7497Pfam-B_7515Pfam-B_7614Pfam-B_7695Pfam-B_8237Pfam-B_3450Pfam-B_2793 Pfam-B_228 Pfam-B_8214 Pfam-B_2095 GATase Pfam-B_1866 Pfam-B_1842Pfam-B_3094Pfam-B_3716Pfam-B_4824Pfam-B_6836Pfam-B_4880 Pfam-B_933Pfam-B_489 Pfam-B_6444 Pfam-B_4501Pfam-B_7419Pfam-B_154Pfam-B_7327Pfam-B_1250Pfam-B_7257Pfam-B_11751Pfam-B_7441Pfam-B_7673Pfam-B_2551Pfam-B_8184Pfam-B_8587Pfam-B_2662Pfam-B_8786Pfam-B_8787 Pfam-B_8180Pfam-B_6306 Pfam-B_10129 fe r4_N i fHPfam-B_10140Pfam-B_10186 Pfam-B_1836Pfam-B_6256 Pfam-B_521Pfam-B_1117Pfam-B_6437 Pfam-B_1524 Pfam-B_9714 Pfam-B_96Pfam-B_4941Pfam-B_5008Pfam-B_5055Pfam-B_1580Pfam-B_5095Pfam-B_5096 Pfam-B_3792
Pfam-B_4433Pfam-B_4435Pfam-B_10932Pfam-B_4434Pfam-B_10647Pfam-B_4362Pfam-B_11292 Pfam-B_3921Pfam-B_429Pfam-B_10635Pfam-B_4309Pfam-B_7765Pfam-B_4345 Pfam-B_8564Pfam-B_4173Pfam-B_7510Pfam-B_3885Pfam-B_7459 s o d f ePfam-B_3864Pfam-B_4175 Pfam-B_3849Pfam-B_7398 Pfam-B_2828Pfam-B_3284Pfam-B_884Pfam-B_3385Pfam-B_3588Pfam-B_11177Pfam-B_3681Pfam-B_11589 Pfam-B_808Pfam-B_3745 Pfam-B_304Pfam-B_4676 Pfam-B_2563Pfam-B_2883 Pfam-B_6070Pfam-B_2391Pfam-B_2044Pfam-B_258Pfam-B_3806Pfam-B_2595Pfam-B_6719Pfam-B_2741
Pfam-B_5620
Pfam-B_10
Pfam-B_556Pfam-B_1963
Pfam-B_8797
Pfam-B_543
Pfam-B_951 Pfam-B_7551
Pfam-B_1964
Pfam-B_3321
Pfam-B_1790
Pfam-B_10086
Pfam-B_10092
Pfam-B_2359Pfam-B_4537
Pfam-B_10091
Pfam-B_7478
Pfam-B_10087
Pfam-B_351
Pfam-B_1071
Pfam-B_1949
Pfam-B_1699
Pfam-B_5720
Pfam-B_8880
Pfam-B_5721
Pfam-B_3301
Pfam-B_8881
Pfam-B_1416
Pfam-B_860
Pfam-B_1904 Pfam-B_387
Pfam-B_2538
Pfam-B_1882
Pfam-B_21
Pfam-B_1447
Pfam-B_278
Pfam-B_262
Pfam-B_3286
Pfam-B_10682
Pfam-B_2415
Pfam-B_3265Pfam-B_8619
Pfam-B_10725
Pfam-B_648
Pfam-B_2417
Pfam-B_4706
Pfam-B_4705
Pfam-B_10624
Pfam-B_266
Pfam-B_10598Pfam-B_10674
Pfam-B_6579
Pfam-B_29
Pfam-B_3822
Pfam-B_950
Pfam-B_542
Pfam-B_2653
Pfam-B_2613 Pfam-B_169 Pfam-B_454
Pfam-B_949
Pfam-B_8467
Pfam-B_4821
Pfam-B_6173
Pfam-B_9588
Pfam-B_1180
Pfam-B_6512
Pfam-B_2398
Pfam-B_3561
Pfam-B_3856
Pfam-B_2397
Pfam-B_3567Pfam-B_3569
Pfam-B_6510Pfam-B_2394
Pfam-B_1596
Pfam-B_2629
Pfam-B_6742
Pfam-B_627Pfam-B_6743
Pfam-B_1850
Pfam-B_2046
Pfam-B_6739
Pfam-B_11369
Pfam-B_593
Pfam-B_562
Pfam-B_7206Pfam-B_1773
Pfam-B_6918Pfam-B_2123Pfam-B_4352
Pfam-B_11361
Pfam-B_192Pfam-B_4778
Pfam-B_9149
Pfam-B_8928t h i o r e dPfam-B_1003
Pfam-B_561
Pfam-B_712
Pfam-B_11600
Pfam-B_1158
Pfam-B_2480
Pfam-B_3354
Pfam-B_2307
Pfam-B_1328
Pfam-B_1318Pfam-B_6786
Pfam-B_8753
Pfam-B_8754
Pfam-B_1153
Pfam-B_811
Pfam-B_11477Pfam-B_7598Pfam-B_11368Pfam-B_545Pfam-B_2098Pfam-B_5220
Pfam-B_11360Pfam-B_3287
Pfam-B_1225Pfam-B_959
Pfam-B_240 Pfam-B_651
Pfam-B_2082 Pfam-B_254Pfam-B_190Pfam-B_8755
Pfam-B_5221
Pfam-B_2134 Pfam-B_1380
Pfam-B_2947
Pfam-B_3614
Pfam-B_3900
Pfam-B_4993Pfam-B_6280Pfam-B_7457Pfam-B_8053Pfam-B_8637Pfam-B_3565Pfam-B_580 Pfam-B_947 Pfam-B_6310
Pfam-B_4152
Pfam-B_3246
Pfam-B_487
Pfam-B_186Pfam-B_2908
Pfam-B_5390
Pfam-B_10529
Pfam-B_577
Pfam-B_7468
Pfam-B_1217
Pfam-B_1280
Pfam-B_655Pfam-B_5880Pfam-B_239
Pfam-B_4992Pfam-B_4983
Pfam-B_7858
Pfam-B_5615Pfam-B_2227
Pfam-B_7454Pfam-B_8845
Pfam-B_5280
Pfam-B_8645
Pfam-B_1040Pfam-B_11111Pfam-B_1151
Pfam-B_1563
Pfam-B_321Pfam-B_1847
Pfam-B_6171Pfam-B_9879
Pfam-B_5783 Pfam-B_9421 Pfam-B_2306
Pfam-B_198 Pfam-B_4840Pfam-B_6851
Pfam-B_6163Pfam-B_9870
Pfam-B_2018Pfam-B_564Pfam-B_876 Pfam-B_5318 Pfam-B_3349 Pfam-B_3078
Pfam-B_3207Pfam-B_9878
Pfam-B_1943Pfam-B_247
Pfam-B_1338
Pfam-B_9601
Pfam-B_5551Pfam-B_246Pfam-B_183 Pfam-B_2800 Pfam-B_3347Pfam-B_325Pfam-B_6627
Pfam-B_4271Pfam-B_9635
Pfam-B_9423Pfam-B_1332
Pfam-B_5782
Pfam-B_6046
Pfam-B_2627
Pfam-B_994
Pfam-B_7410
Pfam-B_4374 Pfam-B_5526
Pfam-B_5553 Pfam-B_535 Pfam-B_4320
Pfam-B_5859
Pfam-B_3079 Pfam-B_565
Pfam-B_6071 Pfam-B_5125Pfam-B_6852 Pfam-B_2192
Pfam-B_6477
Pfam-B_1330 Pfam-B_4677
Pfam-B_8346Pfam-B_1805 Pfam-B_3266Pfam-B_519
Pfam-B_2638Pfam-B_2799Pfam-B_6047
Pfam-B_9871
Pfam-B_4113
Pfam-B_7409 Pfam-B_6164
Pfam-B_371Pfam-B_1442Pfam-B_7527
Pfam-B_720
Pfam-B_1784
Pfam-B_4615
Pfam-B_10202Pfam-B_6529 Pfam-B_9521 Pfam-B_722 HSP70
Pfam-B_3045
Pfam-B_10887
Pfam-B_4708
Pfam-B_4857 Pfam-B_5049 COX2 Pfam-B_4183 Pfam-B_420
Pfam-B_539Pfam-B_221Pfam-B_1412Pfam-B_1639Pfam-B_2536
Pfam-B_5107Pfam-B_5389
Pfam-B_3605
Pfam-B_4331Pfam-B_9338 Pfam-B_5439 Pfam-B_1122 Pfam-B_7185 Pfam-B_665 Pfam-B_4340 Pfam-B_6097
Pfam-B_1666Pfam-B_4832P r i b o s y l t r a n
Pfam-B_2055
Pfam-B_7738Pfam-B_289Pfam-B_5520
Pfam-B_7140
Pfam-B_1587Pfam-B_1643Pfam-B_7357Pfam-B_996 Pfam-B_7365 Pfam-B_5991Pfam-B_2217
Pfam-B_841
Pfam-B_2543 Pfam-B_3699Pfam-B_852
Pfam-B_3089
Pfam-B_2286Pfam-B_5123
p h o s l i p
Pfam-B_929Pfam-B_7503
Pfam-B_10638
Pfam-B_2888
Pfam-B_3579
Pfam-B_753
Pfam-B_5333
Pfam-B_9345Pfam-B_11469
Pfam-B_2948
Pfam-B_572 Pfam-B_4892
Pfam-B_10602
Pfam-B_10026
Pfam-B_2317
Pfam-B_8352
Pfam-B_10030 Pfam-B_11421
Pfam-B_9701
Pfam-B_719
Pfam-B_10031
Pfam-B_7495
Pfam-B_7476
Pfam-B_6754
Pfam-B_1105
Pfam-B_567
Pfam-B_1394
Pfam-B_181
Pfam-B_8894
Pfam-B_3445
Pfam-B_11520Pfam-B_112
Pfam-B_7645
Pfam-B_898
Pfam-B_8604
Pfam-B_738 Pfam-B_3503
Pfam-B_6427
Pfam-B_1910
Pfam-B_5906Pfam-B_807
Pfam-B_7389
Pfam-B_7063Pfam-B_2040
Pfam-B_5424
Pfam-B_1378
Pfam-B_11573
Pfam-B_3847Pfam-B_7584
Pfam-B_7586
Pfam-B_10873
Pfam-B_103
Pfam-B_3047
Pfam-B_1257
Pfam-B_5062
Pfam-B_265
Pfam-B_7585
Pfam-B_5191
Pfam-B_6571
Pfam-B_6570
Pfam-B_3906
Pfam-B_5190 Pfam-B_4697
Pfam-B_5151
Pfam-B_1735
Pfam-B_3908
Pfam-B_7725
Pfam-B_2556
Pfam-B_290
Pfam-B_7726
Pfam-B_3
Pfam-B_7259
Pfam-B_317
Pfam-B_2557
Pfam-B_7362
Pfam-B_8748
Pfam-B_2076Pfam-B_8747Pfam-B_2974 Pfam-B_1794
Pfam-B_7363
Pfam-B_8746 Pfam-B_10356
Pfam-B_11397
Pfam-B_7258
K H - d o m a i n
n e u rPfam-B_9207 Pfam-B_5592
Pfam-B_9467 Pfam-B_4040 Pfam-B_9946 Pfam-B_8761
g l u t s b e t a - l a c t a m a s e P fam-B_10791
Pfam-B_2914
Pfam-B_1231Pfam-B_6435
Pfam-B_10872Pfam-B_11388
Pfam-B_4616Pfam-B_3070
Pfam-B_11671
Pfam-B_3907
Pfam-B_482
Pfam-B_3178
Pfam-B_5978 w n t P fam-B_11320
Pfam-B_6722
Pfam-B_1163
Pfam-B_4634
Pfam-B_6452
Pfam-B_2093
Pfam-B_11321
Pfam-B_1919
Pfam-B_4767
Pfam-B_2246
Pfam-B_5980Pfam-B_6115
Pfam-B_9790Pfam-B_1334Pfam-B_8712
Pfam-B_8720
Pfam-B_4470Pfam-B_1937
Pfam-B_7359 Pfam-B_1346 Pfam-B_10731
Pfam-B_1118 Pfam-B_10198
Pfam-B_1410
Pfam-B_5092 Pfam-B_3760
Pfam-B_2753
Pfam-B_3238
Pfam-B_3239
Pfam-B_2644 Pfam-B_1838
Pfam-B_10036 Pfam-B_7418 Pfam-B_7051 Pfam-B_490 Pfam-B_10967 Pfam-B_7292Pfam-B_3034 Pfam-B_2807Pfam-B_3705Pfam-B_3861 Pfam-B_6921
Pfam-B_3693
Pfam-B_777Pfam-B_11350
Pfam-B_4800
Pfam-B_6201 Pfam-B_1375
Pfam-B_11670
Pfam-B_5970 p e r o x i d a s e Pfam-B_7496
Pfam-B_9524
Pfam-B_8275Pfam-B_2612
Pfam-B_11288
Pfam-B_6708
Pfam-B_146
Pfam-B_5384
Pfam-B_11322
Pfam-B_2137
Pfam-B_2138Pfam-B_2354Pfam-B_6065
Pfam-B_4438
Pfam-B_5789
Pfam-B_10073Pfam-B_2478
Pfam-B_3430
Pfam-B_2205Pfam-B_2663 Pfam-B_4123Pfam-B_2186Pfam-B_1034thy rog lobu l i n_1 P fam-B_8740Pfam-B_4004Pfam-B_8741Pfam-B_8589 Pfam-B_5155Pfam-B_7356Pfam-B_7924Pfam-B_5307
MHC_I
Pfam-B_9470Pfam-B_7243
Pfam-B_786
Pfam-B_2513
Pfam-B_7619
Pfam-B_7416
Pfam-B_10517
Pfam-B_7417
Pfam-B_9358
Pfam-B_5934
Pfam-B_1484
Pfam-B_1485
Pfam-B_360
Pfam-B_7246
Pfam-B_1487
Pfam-B_5239
heme_1Pfam-B_1812
Pfam-B_7248Pfam-B_7773Pfam-B_8662
Pfam-B_2664Pfam-B_6778
Pfam-B_7098
ox ido red_mo lyb P fam-B_10677
Pfam-B_2748Pfam-B_10688
Pfam-B_2499
Pfam-B_3022
Pfam-B_1575
Pfam-B_868
Pfam-B_5652tRNA-syn t_1
Pfam-B_4703 Pfam-B_4902 Pfam-B_5627
Pfam-B_8623Pfam-B_7774
Pfam-B_244
Pfam-B_1399Pfam-B_2105
Pfam-B_291
Pfam-B_615
Pfam-B_5019
Pfam-B_1111
Pfam-B_7237
Pfam-B_1395
Pfam-B_207
Pfam-B_5628
Pfam-B_7775
Pfam-B_1759
Pfam-B_4285
Pfam-B_7383Pfam-B_2858
Pfam-B_4803
Pfam-B_2112
Pfam-B_2560
Pfam-B_1114
Pfam-B_1249Pfam-B_9357
Pfam-B_4364
Pfam-B_983
Pfam-B_6537 Pfam-B_838
Pfam-B_10640
Pfam-B_2243
Pfam-B_5804
Pfam-B_5670
Pfam-B_232
Pfam-B_641Pfam-B_2454
Pfam-B_1406
Pfam-B_3668Pfam-B_11501
Pfam-B_3667
Pfam-B_2986a l d e d h
Pfam-B_10281
Pfam-B_4339
h o r m o n e 2
Pfam-B_4641
Pfam-B_19367 t m _ 2
Pfam-B_900
Pfam-B_1783
Zn_c lus
Pfam-B_1877
Pfam-B_1946
Pfam-B_1785
Pfam-B_3737
Pfam-B_4186
Pfam-B_3843Pfam-B_1552
Pfam-B_2072
Pfam-B_1554
Pfam-B_835Pfam-B_4637
Pfam-B_4960
Pfam-B_4961
Pfam-B_9146
Pfam-B_8650
Pfam-B_173
Pfam-B_10637Pfam-B_5803Pfam-B_839
Pfam-B_10452
Pfam-B_56Pfam-B_11276
Pfam-B_10451
Pfam-B_10450
Pfam-B_408
Pfam-B_723
Pfam-B_388
Pfam-B_3404
Pfam-B_8667
Pfam-B_6174
Pfam-B_3738response_reg
Pfam-B_361
Pfam-B_3412
Pfam-B_2815
Pfam-B_9991Pfam-B_4887 Pfam-B_6467
Pfam-B_7041
Pfam-B_327Pfam-B_3469
Pfam-B_4453
Pfam-B_222
Pfam-B_6335
Pfam-B_9144Pfam-B_2017
Pfam-B_2443
Pfam-B_5835
p 4 5 0
Pfam-B_1676
Pfam-B_3041
Pfam-B_10555
Pfam-B_5974
v w d
Pfam-B_5972Pfam-B_8670
Pfam-B_3154
Pfam-B_11143
Pfam-B_1591Pfam-B_5377
Pfam-B_7108
Pfam-B_35
Pfam-B_8719Pfam-B_977Pfam-B_4286
Pfam-B_374
Pfam-B_204
Pfam-B_916Pfam-B_2591Pfam-B_1579
Pfam-B_583
Pfam-B_3846Pfam-B_2453Pfam-B_1278
Pfam-B_682Pfam-B_7904 Pfam-B_895
Pfam-B_6363Pfam-B_8179Pfam-B_1348
Pfam-B_430 Pfam-B_1553
Pfam-B_6810
s u b t i l a s e Pfam-B_4804Pfam-B_4548
Pfam-B_7062
Pfam-B_5671
Pfam-B_3736
Pfam-B_4822
Pfam-B_24
Pfam-B_17
Pfam-B_3411
Pfam-B_1761 Pfam-B_53
Pfam-B_469
Pfam-B_250 Pfam-B_988
Pfam-B_301
Pfam-B_10633
Pfam-B_9145
Pfam-B_2648
Pfam-B_926
Pfam-B_7132Pfam-B_3523
Pfam-B_8493Pfam-B_1627 Pfam-B_208
Pfam-B_1224
Pfam-B_2928Pfam-B_8
Pfam-B_11841Pfam-B_6044
pro_ isomerase
P fam-B_3721
Pfam-B_1626Pfam-B_3860
Pfam-B_362Pfam-B_2109Pfam-B_7285
Pfam-B_200
Pfam-B_9143
Pfam-B_5138Pfam-B_4913
Pfam-B_3724
Pfam-B_3149
Pfam-B_1562Pfam-B_11142
Pfam-B_70
Pfam-B_2926
Pfam-B_7501
Pfam-B_3117
Pfam-B_355
Pfam-B_3672Pfam-B_2956 Pfam-B_11511
Pfam-B_927
Pfam-B_126
Pfam-B_3670 Pfam-B_235Pfam-B_117
Pfam-B_8680
Pfam-B_4143
Pfam-B_2755
Pfam-B_809
Pfam-B_647
Pfam-B_5746Pfam-B_8312
Pfam-B_5509Pfam-B_5517
Pfam-B_4102
Pfam-B_4107
Pfam-B_5471
Pfam-B_8368
Pfam-B_5516Pfam-B_295
Pfam-B_1848
Pfam-B_5487
Pfam-B_8305
Pfam-B_6066Pfam-B_8326
Pfam-B_666
Pfam-B_2187
Pfam-B_3434
Pfam-B_10509
Pfam-B_4024
Pfam-B_832Pfam-B_427
Pfam-B_10511
Pfam-B_776
Pfam-B_10741
Pfam-B_3426
Pfam-B_10512
Pfam-B_1046
Pfam-B_6733
Pfam-B_287
Pfam-B_2988Pfam-B_560
Pfam-B_405
Pfam-B_234Pfam-B_1233Pfam-B_6492
Pfam-B_6822
Pfam-B_10508
Pfam-B_6736
Pfam-B_4873 k e t o a c y l - s y n tP fam-B_177
Pfam-B_133
Pfam-B_6863
Pfam-B_50
Pfam-B_11649Pfam-B_36
Pfam-B_8699
Pfam-B_2816
Pfam-B_6233
Pfam-B_4407
Pfam-B_3162Pfam-B_3164
Pfam-B_612Pfam-B_2927
Pfam-B_5387
Pfam-B_4984Pfam-B_6231
Pfam-B_902
Pfam-B_9992 Pfam-B_613
Pfam-B_791
Pfam-B_6946Pfam-B_2071
Pfam-B_9565
Pfam-B_9993
Pfam-B_3881
Pfam-B_10775Pfam-B_4194
Pfam-B_3393
Pfam-B_4662Pfam-B_1682 Pfam-B_3330
Pfam-B_9385
Pfam-B_11140Pfam-B_11141
Pfam-B_9859
Pfam-B_2953
Pfam-B_6819
Pfam-B_52
Pfam-B_6862
Pfam-B_517
Pfam-B_6875
Pfam-B_6334
Pfam-B_241
Pfam-B_11163
Pfam-B_11162
Pfam-B_1569
Pfam-B_9382
Pfam-B_6237
Pfam-B_9998
Pfam-B_4511
Pfam-B_4509
Pfam-B_4508
Pfam-B_4510
Pfam-B_970
Pfam-B_6333
Pfam-B_6336
Pfam-B_6337
Pfam-B_5000Pfam-B_2742
Pfam-B_10226
Pfam-B_7068
Pfam-B_3087
Pfam-B_7195
Pfam-B_9383
Pfam-B_6604
Pfam-B_8358
Pfam-B_8289
Pfam-B_8423
Pfam-B_4141
Pfam-B_10802
Pfam-B_4801
Pfam-B_2950
Pfam-B_1100
Pfam-B_203
Pfam-B_6773
Pfam-B_4789
Pfam-B_69
Pfam-B_39
Pfam-B_925
Pfam-B_11412
Pfam-B_706
Pfam-B_3138
Pfam-B_668Pfam-B_799
Pfam-B_1333
Pfam-B_294Pfam-B_1863
Pfam-B_741
Pfam-B_5513
Pfam-B_10216
Pfam-B_10215
Pfam-B_1520
Pfam-B_3500
Pfam-B_2370
Pfam-B_549
Pfam-B_1978
Pfam-B_11491
Pfam-B_9702
Pfam-B_11415
Pfam-B_1551
Pfam-B_849
Pfam-B_11414
Pfam-B_72
Pfam-B_709Pfam-B_101
Pfam-B_404Pfam-B_268
Pfam-B_2942
Pfam-B_1001Pfam-B_848
c p n 6 0
Pfam-B_9646
Pfam-B_9623
Pfam-B_9624
Pfam-B_8284
Pfam-B_8439
Pfam-B_1629
Pfam-B_4159
Pfam-B_8419
Pfam-B_8508
Pfam-B_800
Pfam-B_1446
Pfam-B_8430Pfam-B_33Pfam-B_629
Pfam-B_4011
Pfam-B_1853
Pfam-B_576Pfam-B_3240Pfam-B_1454
Pfam-B_1449Pfam-B_4154
Pfam-B_322
Pfam-B_347
Pfam-B_1860
tsp_1 Pfam-B_617 Pfam-B_11929 Pfam-B_1942 Pfam-B_6060Pfam-B_9656 Pfam-B_9465 Pfam-B_7547Pfam-B_5130Pfam-B_6982Pfam-B_7737Pfam-B_5227Pfam-B_7833 Pfam-B_7683 Pfam-B_6952Pfam-B_7060Pfam-B_3399Pfam-B_7094Pfam-B_934Pfam-B_2806Pfam-B_7502Pfam-B_450 Pfam-B_7395 t h i o l a s e Pfam-B_11630Pfam-B_6876Pfam-B_6868Pfam-B_6867Pfam-B_1827 Pfam-B_6609Pfam-B_8795Pfam-B_661Pfam-B_736Pfam-B_673Pfam-B_8730Pfam-B_6731 Pfam-B_2401 Pfam-B_5211Pfam-B_3947Pfam-B_5836Pfam-B_536Pfam-B_9290Pfam-B_5898Pfam-B_5996Pfam-B_6519Pfam-B_1951Pfam-B_6252Pfam-B_2990Pfam-B_611 Pfam-B_6088Pfam-B_5997Pfam-B_9751Pfam-B_853Pfam-B_1766Pfam-B_883Pfam-B_2987Pfam-B_8848
Pfam-B_5099Pfam-B_5134Pfam-B_5213Pfam-B_5302Pfam-B_5608Pfam-B_5738Pfam-B_5743Pfam-B_5774 Pfam-B_5093Pfam-B_6618 Pfam-B_6011Pfam-B_6236 Pfam-B_579Pfam-B_5950 Pfam-B_5840Pfam-B_6268Pfam-B_6644Pfam-B_6700Pfam-B_6758Pfam-B_6795 Pfam-B_6626Pfam-B_6747Pfam-B_6757 Pfam-B_6725Pfam-B_6835Pfam-B_6864Pfam-B_6869Pfam-B_6870Pfam-B_6955Pfam-B_7003Pfam-B_7104Pfam-B_7187Pfam-B_7190Pfam-B_7448 Pfam-B_7438 Pfam-B_7316 Pfam-B_7277 Pfam-B_7254 Pfam-B_7216Pfam-B_7470Pfam-B_9860 Pfam-B_962 Pfam-B_7731 Pfam-B_7589Pfam-B_908Pfam-B_9147 Pfam-B_8585 Pfam-B_7579Pfam-B_7835Pfam-B_8403Pfam-B_9292r e c APfam-B_10316Pfam-B_10399Pfam-B_2864 Pfam-B_8325 Pfam-B_2175Pfam-B_7176 Pfam-B_1027 Pfam-B_8133Pfam-B_2625 Pfam-B_1342Pfam-B_227
Pfam-B_7633Pfam-B_5457Pfam-B_2615Pfam-B_7657Pfam-B_6209Pfam-B_6202Pfam-B_6873Pfam-B_6874 Pfam-B_3832Pfam-B_3859Pfam-B_7439Pfam-B_4052Pfam-B_3167Pfam-B_4119 Pfam-B_4080Pfam-B_4081Pfam-B_4120Pfam-B_5188 Pfam-B_7597 Pfam-B_7145Pfam-B_7144 Pfam-B_11443Pfam-B_6789Pfam-B_11498Pfam-B_11579Pfam-B_11795 Pfam-B_11497Pfam-B_11796 Pfam-B_11578Pfam-B_3799 Pfam-B_7125 Pfam-B_11891 Pfam-B_2930Pfam-B_7225 Pfam-B_3757Pfam-B_3831 Pfam-B_2985Pfam-B_3539Pfam-B_3552 Pfam-B_5438Pfam-B_202 Pfam-B_10424Pfam-B_1992 Pfam-B_658 Pfam-B_1969Pfam-B_1702 Pfam-B_4530Pfam-B_1513Pfam-B_1786Pfam-B_2049Pfam-B_2251Pfam-B_9150Pfam-B_2447Pfam-B_2935Pfam-B_2534Pfam-B_7519Pfam-B_2929 Pfam-B_1753 Pfam-B_1319Pfam-B_5896Pfam-B_2114Pfam-B_1407
Pfam-B_7405
Pfam-B_10926
Pfam-B_4831 Pfam-B_10148
Pfam-B_3625
Pfam-B_470
Pfam-B_4332
Pfam-B_5022
Pfam-B_11571
Pfam-B_3620
Pfam-B_2936 Pfam-B_2036 Pfam-B_1108Pfam-B_3844 Pfam-B_475
Pfam-B_1752 Pfam-B_7583
Pfam-B_5064
Pfam-B_1012
Pfam-B_2163Pfam-B_8601Pfam-B_3259Pfam-B_9082Pfam-B_9083Pfam-B_9574
Pfam-B_5023
Pfam-B_9575
Pfam-B_2487
Pfam-B_7596
Pfam-B_5882
Pfam-B_4338
Pfam-B_9258
Pfam-B_3889
Pfam-B_9257
Pfam-B_960
Pfam-B_1582Pfam-B_6577 Pfam-B_1808Pfam-B_5156 Pfam-B_3437 Pfam-B_7626Pfam-B_7724Pfam-B_5218Pfam-B_7784Pfam-B_8319Pfam-B_8013Pfam-B_9700 Pfam-B_4599Pfam-B_10317Pfam-B_4893Pfam-B_4973Pfam-B_2866Pfam-B_5346
Pfam-B_785
Pfam-B_930
Pfam-B_3929
l i p a s ePfam-B_10366 Pfam-B_3791 Pfam-B_3796
Pfam-B_3589 Pfam-B_3793 Pfam-B_406
Pfam-B_1396Pfam-B_3361Pfam-B_1441
Pfam-B_4247
Pfam-B_6803Pfam-B_3748
Pfam-B_1854
Pfam-B_5900
Pfam-B_1005Pfam-B_2533
Pfam-B_5122Pfam-B_1087Pfam-B_857Pfam-B_6939Pfam-B_3105Pfam-B_5537
Pfam-B_8883
Pfam-B_7352
Pfam-B_2709Pfam-B_9985Pfam-B_3100Pfam-B_2474Pfam-B_3275Pfam-B_238Pfam-B_34Pfam-B_4839Pfam-B_11095Pfam-B_5686 Pfam-B_5685 Pfam-B_4849 Pfam-B_5274
Pfam-B_6940Pfam-B_1769
Pfam-B_7353Pfam-B_5538Pfam-B_5539
Pfam-B_7747Pfam-B_5059 Pfam-B_726
Pfam-B_2704Pfam-B_8878Pfam-B_1606Pfam-B_6228Pfam-B_2670Pfam-B_11889Pfam-B_1962Pfam-B_2356
Pfam-B_6995
Pfam-B_5925
Pfam-B_2597
Pfam-B_3174
Pfam-B_1172
Pfam-B_10379
Pfam-B_956 Pfam-B_5684Pfam-B_2222 Pfam-B_3447Pfam-B_4838Pfam-B_4848
Pfam-B_10892 Pfam-B_11457 Pfam-B_393 Pfam-B_11865 Pfam-B_11399Pfam-B_1486 Pfam-B_2958
Pfam-B_6249 Pfam-B_6436Pfam-B_4660
Pfam-B_2602 Pfam-B_6999
Pfam-B_6994
Pfam-B_1586Pfam-B_9252 Pfam-B_9329 Pfam-B_917 Pfam-B_8235 Pfam-B_7531 Pfam-B_5458
Pfam-B_209Pfam-B_867 Pfam-B_635
Pfam-B_11574
Pfam-B_816Pfam-B_8911Pfam-B_9699Pfam-B_10032
Pfam-B_3454Pfam-B_2406Pfam-B_5006
Pfam-B_10609 Pfam-B_7637 Pfam-B_7521 Pfam-B_7413 Pfam-B_7406
Pfam-B_2545
Pfam-B_7177
Pfam-B_4030Pfam-B_5830
Pfam-B_7672
Pfam-B_1952Pfam-B_10381 Pfam-B_3662Pfam-B_745
Pfam-B_5640Pfam-B_6837
Pfam-B_1778 Pfam-B_4826
Pfam-B_4601Pfam-B_7142
Pfam-B_9318 Pfam-B_797 Pfam-B_3231 Pfam-B_4341 Pfam-B_6095
Pfam-B_3101 Pfam-B_3061
Pfam-B_3787
Pfam-B_3475Pfam-B_3808Pfam-B_721Pfam-B_421Pfam-B_1094
Pfam-B_8757
Pfam-B_3194Pfam-B_3606
Pfam-B_8239
Pfam-B_4089Pfam-B_3607Pfam-B_4836
Pfam-B_11614Pfam-B_11404
Pfam-B_11615Pfam-B_2056
Pfam-B_9518
Pfam-B_4845
Pfam-B_7001Pfam-B_4366Pfam-B_5611Pfam-B_3046
Pfam-B_6581
Pfam-B_3378
Pfam-B_5942
Pfam-B_3521
Pfam-B_4579
c o n n e x i nPfam-B_10328
Pfam-B_323
Pfam-B_4707
Pfam-B_6536 Pfam-B_3453 Pfam-B_8813 Pfam-B_1597 Pfam-B_3077 Pfam-B_10375 Pfam-B_1341Pfam-B_5993 Pfam-B_3661Pfam-B_5639Pfam-B_4825Pfam-B_1777Pfam-B_1273
Pfam-B_7317Pfam-B_11866Pfam-B_5
Pfam-B_783Pfam-B_11458
Pfam-B_9387Pfam-B_9386
Pfam-B_4730Pfam-B_10382
Pfam-B_2938Pfam-B_6977Pfam-B_6848 Pfam-B_2048Pfam-B_2544
Pfam-B_2462Pfam-B_10893Pfam-B_11726
Pfam-B_8253Pfam-B_435
Pfam-B_7744
Pfam-B_8799Pfam-B_9833Pfam-B_5619 S 1 2
a d h _ s h o r t
P fam-B_10391
Pfam-B_10788
Pfam-B_10790
Pfam-B_3457
Pfam-B_9831
Pfam-B_10785Pfam-B_7511
Pfam-B_2229
Pfam-B_10793
Pfam-B_10318
Pfam-B_10319
Pfam-B_9317
Pfam-B_2294
Pfam-B_4400
Pfam-B_451
Pfam-B_9237
Pfam-B_9537
Pfam-B_1174
Pfam-B_1175
Pfam-B_1061
Pfam-B_4401
Pfam-B_6015
Pfam-B_2512
Pfam-B_1678Pfam-B_1062
Pfam-B_2760
Pfam-B_5618 Pfam-B_4050
Pfam-B_10693Pfam-B_3006
Pfam-B_8252
Pfam-B_650Pfam-B_3305
Pfam-B_1337
Pfam-B_9830
Pfam-B_5396Pfam-B_2209
Pfam-B_6138
Pfam-B_6140 Pfam-B_6598 Pfam-B_10786
Pfam-B_6599
Pfam-B_1523
Pfam-B_4327
Pfam-B_2850Pfam-B_11805
Pfam-B_4885
Pfam-B_694
Pfam-B_5004
Pfam-B_182
Pfam-B_740
Pfam-B_739Pfam-B_6827
Pfam-B_2295
Pfam-B_4817
Pfam-B_664
Pfam-B_856
Pfam-B_11566
Pfam-B_4962
Pfam-B_4963
Pfam-B_1179Pfam-B_1327
Pfam-B_7662Pfam-B_48
Pfam-B_4414
Pfam-B_494
Pfam-B_2762
Pfam-B_2124
Pfam-B_5157Pfam-B_7603
Pfam-B_3603
Pfam-B_9547
Pfam-B_1420 Pfam-B_7606
Pfam-B_9523
Pfam-B_7600
Pfam-B_11864
Pfam-B_5158
Pfam-B_7607
Pfam-B_10253
Pfam-B_2373
Pfam-B_1421
Pfam-B_7605
Pfam-B_7608
Pfam-B_862
Pfam-B_7601
Pfam-B_214
Pfam-B_3182
Pfam-B_191
Pfam-B_10469
Pfam-B_10460
Pfam-B_464
Pfam-B_703
Pfam-B_55
Pfam-B_6347Pfam-B_10463
Pfam-B_10464
Pfam-B_984
Pfam-B_2685Pfam-B_3186
Pfam-B_457
Pfam-B_9415Pfam-B_2565
Pfam-B_7580
Pfam-B_7755
Pfam-B_6606Pfam-B_3185 Pfam-B_5647 Pfam-B_10465
Pfam-B_3184 Pfam-B_213Pfam-B_10466
Pfam-B_2870
Pfam-B_165
Pfam-B_1213
Pfam-B_510
Pfam-B_509 Pfam-B_1212
Pfam-B_6471
Pfam-B_139
Pfam-B_1356
Pfam-B_3807
Pfam-B_2141
Pfam-B_153
Pfam-B_3107
Pfam-B_7762
Pfam-B_622
Pfam-B_379
Pfam-B_318
Pfam-B_8056
Pfam-B_8109
Pfam-B_7723
Pfam-B_180
Pfam-B_7998
Pfam-B_8043
Pfam-B_7721
Pfam-B_7007
Pfam-B_5379Pfam-B_8127
Pfam-B_4925
Pfam-B_2599
Pfam-B_8137
Pfam-B_3168
Pfam-B_8167
Pfam-B_9319
Pfam-B_10820
Pfam-B_1745
Pfam-B_3613
Pfam-B_2429Pfam-B_2431
Pfam-B_10825
Pfam-B_2430
Pfam-B_1744
Pfam-B_779
Pfam-B_485
Pfam-B_5532 Pfam-B_7782
Pfam-B_7768
Pfam-B_2642
Pfam-B_2382
Pfam-B_802s i g m a 5 4
Pfam-B_7722Pfam-B_3951
Pfam-B_3950Pfam-B_5217
Pfam-B_3949
Pfam-B_206
Pfam-B_179
Pfam-B_663
Pfam-B_3064
Pfam-B_5216Pfam-B_3948 Pfam-B_5215
Pfam-B_343Pfam-B_5094 Pfam-B_557
Pfam-B_2894
Pfam-B_2895
Pfam-B_2896
Pfam-B_6576Pfam-B_2276Pfam-B_3925Pfam-B_2320
Pfam-B_4698
Pfam-B_10699
Pfam-B_1088Pfam-B_6572
Pfam-B_604
Pfam-B_2266
Pfam-B_1147
Pfam-B_1791
Pfam-B_1404
Pfam-B_242
Pfam-B_205Pfam-B_3440 Pfam-B_7628 Pfam-B_2273Pfam-B_1950
Pfam-B_10692Pfam-B_5170Pfam-B_2321
Pfam-B_3028Pfam-B_2514Pfam-B_9963
Pfam-B_10698
Pfam-B_2768
Pfam-B_1254
Pfam-B_151
Pfam-B_342Pfam-B_9962
Pfam-B_953
Pfam-B_1581
Pfam-B_9964
Pfam-B_1697
Pfam-B_7344
Pfam-B_938Pfam-B_9304
Pfam-B_2268
Pfam-B_9232
Pfam-B_259
Pfam-B_417
Pfam-B_10161
Pfam-B_6601
Pfam-B_9230Pfam-B_4569
Pfam-B_7615
Pfam-B_7625
Pfam-B_7593
Pfam-B_1320
Pfam-B_1423
Pfam-B_846Pfam-B_10816 Pfam-B_3525 Pfam-B_5781 Pfam-B_10190Pfam-B_9778 Pfam-B_8507 Pfam-B_538Pfam-B_8067 Pfam-B_513 Pfam-B_9227
Pfam-B_602 Pfam-B_10821Pfam-B_922
Pfam-B_2428Pfam-B_10823
Pfam-B_882 Pfam-B_493
Pfam-B_10162Pfam-B_3926
Pfam-B_4570
Pfam-B_6701
Pfam-B_9030
Pfam-B_3110
Pfam-B_4156
Pfam-B_3915
Pfam-B_7627 Pfam-B_824
Pfam-B_958
Pfam-B_10160
Pfam-B_906
Pfam-B_7594
Pfam-B_7136Pfam-B_514Pfam-B_842
Pfam-B_4686
Pfam-B_7426Pfam-B_9359
Pfam-B_6728Pfam-B_9459
Pfam-B_765
Pfam-B_7771
Pfam-B_5376
Pfam-B_10245
Pfam-B_9365
Pfam-B_10477
Pfam-B_3549
Pfam-B_1997
Pfam-B_10476
Pfam-B_10479Pfam-B_6475
Pfam-B_271Pfam-B_397
Pfam-B_7900
Pfam-B_3295
Pfam-B_11642
Pfam-B_11638
Pfam-B_2619
Pfam-B_7107Pfam-B_312
Pfam-B_5185
Pfam-B_11164
Pfam-B_5167 Pfam-B_5184
Pfam-B_6924
Pfam-B_5234
Pfam-B_8696
Pfam-B_9129
Pfam-B_6183
Pfam-B_7526
Pfam-B_3945
Pfam-B_6184
Pfam-B_7750
Pfam-B_5797
Pfam-B_563
Pfam-B_2469
Pfam-B_3903
Pfam-B_7569
Pfam-B_1418
Pfam-B_9049
Pfam-B_1595
ho rmone_ rec
Pfam-B_3905
Pfam-B_9057
Pfam-B_1913Pfam-B_9303
Pfam-B_9285Pfam-B_9305
Pfam-B_2240
Pfam-B_4046
Pfam-B_5643
Pfam-B_4279
Pfam-B_9455
Pfam-B_7769
Pfam-B_7772
Pfam-B_6878
Pfam-B_8727
Pfam-B_6723
Pfam-B_749
ld l_ recept_b
Pfam-B_1258
Pfam-B_6480
Pfam-B_8722
Pfam-B_2923
Pfam-B_11897
Pfam-B_6323
Pfam-B_4909
Pfam-B_6478
Pfam-B_10218
Pfam-B_5650
Pfam-B_2838
Pfam-B_6988
Pfam-B_10368
Pfam-B_3904
Pfam-B_6987
Pfam-B_4199
Pfam-B_1043
Pfam-B_5864
Pfam-B_73
Pfam-B_11898Pfam-B_4908
Pfam-B_11068
Pfam-B_11075
Pfam-B_11092Pfam-B_1876
Pfam-B_1297tRNA-syn t_2
Pfam-B_9897
S H 3
Pfam-B_1720
Pfam-B_4202
Pfam-B_2598
Pfam-B_8407
Pfam-B_7010Pfam-B_878
Pfam-B_3730
Pfam-B_2060Pfam-B_8518
Pfam-B_4726
Pfam-B_1189
Pfam-B_1855
Pfam-B_5406
Pfam-B_1956Pfam-B_2016
Pfam-B_2061
Pfam-B_534
Pfam-B_8228Pfam-B_1016
Pfam-B_7456Pfam-B_5101
Pfam-B_4563Pfam-B_752
Pfam-B_2161
Pfam-B_8170
Pfam-B_881
Pfam-B_3071
Pfam-B_1713
Pfam-B_8982
Pfam-B_3429
a n k
Pfam-B_3873
Pfam-B_7182
Pfam-B_616
Pfam-B_4807
Pfam-B_7727
Pfam-B_1618
Pfam-B_547
Pfam-B_8003
Pfam-B_10170
Pfam-B_1238Pfam-B_2912
Pfam-B_11592
Pfam-B_3887Pfam-B_2913
Pfam-B_6594
Pfam-B_2766
Pfam-B_5136
Pfam-B_3170
RIP
Pfam-B_2122
Pfam-B_8175
Pfam-B_7210
r a s
Pfam-B_100
Pfam-B_982
Pfam-B_7932
Pfam-B_4006Pfam-B_7816
Pfam-B_324
Pfam-B_919
Pfam-B_372
zf -C4
Pfam-B_2111
Pfam-B_6580
ABC_tran
Pfam-B_2220
Pfam-B_1323
Pfam-B_1056
Pfam-B_768
Pfam-B_11741
Pfam-B_1133
Pfam-B_550
Pfam-B_8150
Pfam-B_5412
Pfam-B_10236
Pfam-B_506
Pfam-B_1206
Pfam-B_1282
Pfam-B_7666Pfam-B_2351 Pfam-B_10189
Pfam-B_4521
Pfam-B_3083
Pfam-B_3474Pfam-B_2350
Pfam-B_2836
Pfam-B_766
Pfam-B_6330
Pfam-B_533
Pfam-B_10242
Pfam-B_10708
Pfam-B_918
Pfam-B_1223
Pfam-B_1502
Pfam-B_141
Pfam-B_10711
Pfam-B_122Pfam-B_3517
Pfam-B_337
Pfam-B_447
Pfam-B_10718
Pfam-B_6660
Pfam-B_4631
Pfam-B_4206
Pfam-B_425
Pfam-B_5214
Pfam-B_3999
Pfam-B_8468
Pfam-B_5075
Pfam-B_5960
Pfam-B_7376
Pfam-B_4174
Y_phosphatase
P fam-B_5078
Pfam-B_5072Pfam-B_10566
Pfam-B_9848
Pfam-B_7378 Pfam-B_2219
Pfam-B_3048
Pfam-B_8094
Pfam-B_705
Pfam-B_3842
Pfam-B_3371
Pfam-B_7055
Pfam-B_6053
Pfam-B_5765
Pfam-B_7399
Pfam-B_147
Pfam-B_1927
Pfam-B_9692
Pfam-B_963
Pfam-B_6563
Pfam-B_623
Pfam-B_5264
Pfam-B_1649
Pfam-B_5142Pfam-B_7573
Pfam-B_7797
Pfam-B_11240
Pfam-B_5144
Pfam-B_5589
Pfam-B_5971
Pfam-B_9220
Pfam-B_8471
Pfam-B_5590
Pfam-B_528
Pfam-B_7023
Pfam-B_4829
Pfam-B_11658
Pfam-B_11657Pfam-B_2961
Pfam-B_3686
Pfam-B_11637
Pfam-B_5208
f e r 2
Pfam-B_5207
Pfam-B_9861
Pfam-B_10756
Pfam-B_1665
Pfam-B_6923
Pfam-B_5176
Pfam-B_2058
Pfam-B_2335
Pfam-B_3183
Pfam-B_1232
Pfam-B_6846
Pfam-B_1390
Pfam-B_10470
Pfam-B_2998
Pfam-B_3575
Pfam-B_5515
Pfam-B_2855
Pfam-B_5908
Pfam-B_3663
Pfam-B_864
Pfam-B_2506
Pfam-B_7826
Pfam-B_7451
Pfam-B_10607
Pfam-B_7480
Pfam-B_7479
Pfam-B_757
Pfam-B_5114Pfam-B_495
Pfam-B_7481
Pfam-B_2528
Pfam-B_5115
Pfam-B_478
Pfam-B_11329
Pfam-B_6062
Pfam-B_5272
Pfam-B_8181 Pfam-B_585
Pfam-B_2084
Pfam-B_9928
Pfam-B_5113
Pfam-B_10600
Pfam-B_10918Pfam-B_4409
Pfam-B_496
Pfam-B_23
Pfam-B_11209
Pfam-B_5246
Pfam-B_1813
C 2
Pfam-B_1839
Pfam-B_11906
Pfam-B_8182
Pfam-B_11905
Pfam-B_1267Pfam-B_11892
Pfam-B_2529
Pfam-B_85Pfam-B_555
r n a s e H
Pfam-B_81
COX1Pfam-B_452
Pfam-B_7201
Pfam-B_134
Pfam-B_1617
Pfam-B_8975Pfam-B_7559
Pfam-B_9606
Pfam-B_3707
Pfam-B_479
Pfam-B_5741
Pfam-B_3899
Pfam-B_8195
Pfam-B_210
Pfam-B_1255
Pfam-B_7926Pfam-B_5799
Pfam-B_2604
Pfam-B_8142
Pfam-B_8141
Pfam-B_3817
Pfam-B_5604
Pfam-B_9905
Pfam-B_6018
Pfam-B_6017
Pfam-B_6801
Pfam-B_1361
Pfam-B_1260
Pfam-B_2403
Pfam-B_3068
Pfam-B_7483
Pfam-B_1804
Pfam-B_6012
Pfam-B_1802
Pfam-B_10597
Pfam-B_2289
Pfam-B_2013
Pfam-B_2405
Pfam-B_4398Pfam-B_2297
Pfam-B_9494
Pfam-B_10604 Pfam-B_10594
Pfam-B_10574
Pfam-B_7486
Pfam-B_497
Pfam-B_3878
Pfam-B_6516
Pfam-B_2015
r v pPfam-B_2012
Pfam-B_2011
Pfam-B_859
Pfam-B_2531
Pfam-B_2299Pfam-B_2296
Pfam-B_6003
r v t
P fam-B_2298Pfam-B_6000
Pfam-B_1090
Pfam-B_10510
Pfam-B_9544Pfam-B_4402
Pfam-B_707
Pfam-B_6041
Pfam-B_9545Pfam-B_896
Pfam-B_9546
Pfam-B_8209
Pfam-B_2758
Pfam-B_6006
Pfam-B_3875
Pfam-B_2527
Pfam-B_8210
Pfam-B_2290Pfam-B_1640
Pfam-B_774
Pfam-B_6511
Pfam-B_10556
Pfam-B_9543Pfam-B_9548
Pfam-B_3879
Pfam-B_2292
Pfam-B_605
Pfam-B_9241
Pfam-B_5323
Pfam-B_5757
Pfam-B_8192
Pfam-B_4252
Pfam-B_4394
Pfam-B_1739
Pfam-B_4395
Pfam-B_9533
Pfam-B_400
Pfam-B_4403
Pfam-B_6802
Pfam-B_4082
Pfam-B_4396
Pfam-B_659
Pfam-B_9531Pfam-B_11467
Pfam-B_2945
Pfam-B_2540
lec t in_c
Pfam-B_11037
Pfam-B_4513
Pfam-B_8207
Pfam-B_8824
Pfam-B_11743
Pfam-B_1730
Pfam-B_1531
Pfam-B_10608
Pfam-B_7477
Pfam-B_9310
Pfam-B_10807
Pfam-B_4110
Pfam-B_2992
Pfam-B_2823
Pfam-B_4919
Pfam-B_6547
Pfam-B_1832
Pfam-B_4356
Pfam-B_3373Pfam-B_4357
Pfam-B_3482
Pfam-B_5999
t o x i n
Pfam-B_3294
Pfam-B_6674
Pfam-B_2764
Pfam-B_3874
Pfam-B_10433
Pfam-B_4536
Pfam-B_4535
Pfam-B_5490
Pfam-B_1835
Pfam-B_4008
Pfam-B_8332
Pfam-B_4261
Pfam-B_6304
Pfam-B_8494
Pfam-B_8333
Pfam-B_6221
Pfam-B_3137Pfam-B_11737
Pfam-B_2149
Pfam-B_4163
Pfam-B_2148
Pfam-B_8126
Pfam-B_481
Pfam-B_730Pfam-B_2572
Pfam-B_3988Pfam-B_2340
Pfam-B_6303
Pfam-B_729
Pfam-B_8334
Pfam-B_8528
Pfam-B_5491
H L H
Pfam-B_3122Pfam-B_1022
Pfam-B_1820
Pfam-B_3985
Pfam-B_10599
Pfam-B_440
Pfam-B_292
Pfam-B_3136
Pfam-B_9219
Pfam-B_2698
a m i n o t r a n
Pfam-B_9832
Pfam-B_2476
Pfam-B_4328
Pfam-B_2293
Pfam-B_1248
Pfam-B_6991
Pfam-B_1150
Pfam-B_4185Pfam-B_7135Pfam-B_377
Pfam-B_11806
Pfam-B_3483 Pfam-B_1017 Pfam-B_5118
Pfam-B_10090
Pfam-B_3573 Pfam-B_522Pfam-B_3511
Pfam-B_67
Pfam-B_3512
Pfam-B_6172
Pfam-B_195
Pfam-B_1102
Pfam-B_256
Pfam-B_2303
Pfam-B_2258
Pfam-B_5552Pfam-B_1098
Pfam-B_9957
Pfam-B_2448
Pfam-B_3839Pfam-B_7366
ox ido red_n i t r o P fam-B_1583
Pfam-B_7178
Pfam-B_127
Pfam-B_5192
Pfam-B_5189
Pfam-B_3815
Pfam-B_5074Pfam-B_3782Pfam-B_5121
Pfam-B_2053
Pfam-B_2718
Pfam-B_3389
Pfam-B_2129
Pfam-B_1724Pfam-B_4068
Pfam-B_4069
Pfam-B_6834 p h o t o R C
Pfam-B_7602
Pfam-B_4710Pfam-B_3917Pfam-B_4518 Pfam-B_131
Pfam-B_3566Pfam-B_6032
Pfam-B_6031
Pfam-B_4820Pfam-B_7552
Pfam-B_8851
Pfam-B_10506
Pfam-B_4137
Pfam-B_6993
Pfam-B_6491
Pfam-B_11499
Pfam-B_2371
Pfam-B_11495
Pfam-B_286 Pfam-B_276
Pfam-B_5171
Pfam-B_2737
Pfam-B_9675
Pfam-B_9674
Pfam-B_163
Pfam-B_6346
Pfam-B_2596
Pfam-B_5400
Pfam-B_4354
Pfam-B_6353
Pfam-B_6403
Pfam-B_2375
Pfam-B_6354
Pfam-B_1982
Pfam-B_5847
Pfam-B_976
Pfam-B_6352
Pfam-B_911
Pfam-B_3709
Pfam-B_1518
Pfam-B_2404
Pfam-B_2014
Pfam-B_11774
Pfam-B_9162
Pfam-B_491
Pfam-B_3345
Pfam-B_414
Pfam-B_1167
Pfam-B_9158
Pfam-B_9168
Pfam-B_2724
Pfam-B_4310
Pfam-B_3344
DNA_pol
P fam-B_3768
Pfam-B_14
Pfam-B_27
Pfam-B_3279
Pfam-B_11
Pfam-B_18
Pfam-B_8707
Pfam-B_3956Pfam-B_419
Pfam-B_1972
ox ido red_ fad
Pfam-B_7796
Pfam-B_4560
Pfam-B_333Pfam-B_5225
Pfam-B_4311
Pfam-B_4312
Pfam-B_9347
Pfam-B_10168
Pfam-B_6298
Pfam-B_10169
Pfam-B_4988 Pfam-B_4948 Pfam-B_4947 Pfam-B_4842 Pfam-B_4788Pfam-B_4979Pfam-B_4987
Pfam-B_11034Pfam-B_11351Pfam-B_11362Pfam-B_11363Pfam-B_11440Pfam-B_6787
Pfam-B_6961
Pfam-B_1263
Pfam-B_2979Pfam-B_6962
Pfam-B_525
Pfam-B_4597Pfam-B_4763Pfam-B_4787
Pfam-B_10679
Pfam-B_2668Pfam-B_589
Pfam-B_8672
Pfam-B_3388
Pfam-B_4193Pfam-B_78
Pfam-B_642
p y r _ r e d o x
Pfam-B_4010
Pfam-B_1214
Pfam-B_10710
Pfam-B_1272
Pfam-B_1130
Pfam-B_2092
Pfam-B_2449
Pfam-B_10724
Pfam-B_1129
Pfam-B_8885 Pfam-B_691
Pfam-B_1243
Pfam-B_5202Pfam-B_11676Pfam-B_7699
Pfam-B_5103Pfam-B_11524Pfam-B_4812Pfam-B_5992Pfam-B_4790Pfam-B_6735Pfam-B_4776Pfam-B_4775
Pfam-B_1362
Pfam-B_8418
Pfam-B_273
Pfam-B_2521
Pfam-B_1121
Pfam-B_10502
Pfam-B_6490 Pfam-B_7851
s i g m a 7 0
Pfam-B_274
Pfam-B_4146
Pfam-B_8473
Pfam-B_10231
Pfam-B_269
Pfam-B_4995
Pfam-B_5923Pfam-B_6488
Pfam-B_4437Pfam-B_8937Pfam-B_4633Pfam-B_1201
Pfam-B_5958Pfam-B_4888 Pfam-B_5780
Pfam-B_9028Pfam-B_1360 Pfam-B_8483
Pfam-B_2639Pfam-B_2464
Pfam-B_6182
Pfam-B_10403
Pfam-B_2007Pfam-B_2006
a p p l e
Pfam-B_1928
Pfam-B_10572
Pfam-B_5368
Pfam-B_620
Pfam-B_939
Pfam-B_108
Pfam-B_2669
Pfam-B_3880Pfam-B_5266
Pfam-B_3422
Pfam-B_10575
Pfam-B_2877
Pfam-B_10576
Pfam-B_7491
Pfam-B_10573Pfam-B_7836
Pfam-B_2288
Pfam-B_2147
Pfam-B_1922
Pfam-B_2878Pfam-B_7838
Pfam-B_3062
Pfam-B_7489
Pfam-B_7488
Pfam-B_3069
Pfam-B_5265
Pfam-B_4663
Pfam-B_6523
Pfam-B_7355
Pfam-B_941
Pfam-B_3572
Pfam-B_10589Pfam-B_382Pfam-B_75
Pfam-B_60Pfam-B_8724
Pfam-B_7487
Pfam-B_10585 Pfam-B_5117
Pfam-B_31
Pfam-B_5653
Pfam-B_4675Pfam-B_2892
Pfam-B_2676
Pfam-B_2881
Pfam-B_5654
Pfam-B_2893
Pfam-B_1533
Pfam-B_1729
Pfam-B_10587
Pfam-B_5161
Pfam-B_731
Pfam-B_4070
Pfam-B_1391
Pfam-B_3987
Pfam-B_2191
Pfam-B_2085
Pfam-B_5112Pfam-B_339
Pfam-B_1803
Pfam-B_3876
Pfam-B_19
Pfam-B_10601
Pfam-B_7482
Pfam-B_2274
Pfam-B_1801
Pfam-B_9398
Pfam-B_540
Pfam-B_9820
Pfam-B_9822
Pfam-B_9821
Pfam-B_9818
Pfam-B_9817
Pfam-B_10157Pfam-B_10163
Pfam-B_2365
Pfam-B_1200
Pfam-B_10166
Pfam-B_3343
Pfam-B_5926Pfam-B_10167
Pfam-B_4561
Pfam-B_418
Pfam-B_503
Pfam-B_10159
Pfam-B_1199
Pfam-B_7632
Pfam-B_1155
Pfam-B_249
Pfam-B_965
Pfam-B_6129
Pfam-B_2252
Pfam-B_7736
Pfam-B_6849
Pfam-B_4227Pfam-B_164
Pfam-B_3090
Pfam-B_9811
Pfam-B_2442
Pfam-B_5399
Pfam-B_9327
Pfam-B_6128
lec t in_ legB
Pfam-B_51
lec t in_ legA
Pfam-B_10164
Pfam-B_9159
Pfam-B_4472
Pfam-B_645
Pfam-B_2725
Pfam-B_4774
Pfam-B_591
Pfam-B_966
Pfam-B_1558
Pfam-B_94
Pfam-B_9668
Pfam-B_2064
Pfam-B_11783
Pfam-B_769
Pfam-B_9819
Pfam-B_2166
Pfam-B_8499
Pfam-B_1450
Pfam-B_4575
Pfam-B_1101
Pfam-B_11832
Pfam-B_10228
Pfam-B_10225
Pfam-B_7160
Pfam-B_4475
Pfam-B_1286
Pfam-B_6515
Pfam-B_456
Pfam-B_284
Pfam-B_2880
Pfam-B_11846
Pfam-B_444
Pfam-B_5267
Pfam-B_10214
Pfam-B_1792
Pfam-B_7354
Pfam-B_687
Pfam-B_9562
Pfam-B_8987
Pfam-B_686
Pfam-B_1124
i n s
Pfam-B_11928
Pfam-B_3444
Pfam-B_2477
Pfam-B_11828
Pfam-B_6992
Pfam-B_5482
Pfam-B_8364
Pfam-B_296
Pfam-B_8360
Pfam-B_1285
Pfam-B_1973 Pfam-B_2616
Pfam-B_7977
Pfam-B_4138
Pfam-B_59
Pfam-B_3467
Pfam-B_6724
Pfam-B_6225
Pfam-B_9981
Pfam-B_9380
Pfam-B_1996
Pfam-B_3172
Pfam-B_1828
Pfam-B_3427
Pfam-B_8974
Pfam-B_3268
Pfam-B_445
Pfam-B_107
Pfam-B_2374
Pfam-B_6355
Pfam-B_3510
Pfam-B_5846
b Z I P
Pfam-B_4313
Pfam-B_8888
FGF Pfam-B_1930
Pfam-B_45
Pfam-B_2455Pfam-B_1259
Pfam-B_3067
Pfam-B_3753Pfam-B_3766
Pfam-B_476
Pfam-B_9339
Pfam-B_3894
a lpha -amy lase
P fam-B_1431
Pfam-B_4581
Pfam-B_10666Pfam-B_3585Pfam-B_6564Pfam-B_6645
Pfam-B_6959Pfam-B_99
Pfam-B_3862
Pfam-B_1698
Pfam-B_819
Pfam-B_1127
Pfam-B_3120
Pfam-B_170
Pfam-B_42
Pfam-B_9508
Pfam-B_49
Pfam-B_2884
GTP_EFTU
Pfam-B_10563
Pfam-B_9215
Pfam-B_4415
Pfam-B_331
Pfam-B_1186
Pfam-B_630Pfam-B_674
zf-CCHC
Pfam-B_530
Pfam-B_582
Pfam-B_411
Pfam-B_2291
Pfam-B_11860
Pfam-B_6610
Pfam-B_1152
Pfam-B_955
Pfam-B_2524
Pfam-B_3193
f n 1
Pfam-B_9903
Pfam-B_4490
Pfam-B_8862
Pfam-B_9901
Pfam-B_1216
Pfam-B_10543
Pfam-B_3829
Pfam-B_9563t r y p s i n
Pfam-B_4088
Pfam-B_453
Pfam-B_818
Pfam-B_1725
f e r 4Pfam-B_10818
Pfam-B_10826
Pfam-B_10819Pfam-B_7286
Pfam-B_688 Pfam-B_2611
f n 3
Pfam-B_10822
v w a
Pfam-B_4499
Pfam-B_4852
v w c Pfam-B_2763
Pfam-B_7263
Pfam-B_8241
Pfam-B_8249
Pfam-B_8245Pfam-B_5028
w a p
Pfam-B_3969
f n 2
Pfam-B_8302
Pfam-B_8242Pfam-B_8238Pfam-B_10827
Pfam-B_5350
Pfam-B_1512Pfam-B_1029
Pfam-B_2178
Pfam-B_9942
Pfam-B_7274
Kuni tz_BPTIPfam-B_2756
Pfam-B_2682
Pfam-B_10824Pfam-B_10817
Pfam-B_1177Pfam-B_1326
Pfam-B_1060
Pfam-B_6033Pfam-B_3195Pfam-B_4813
Pfam-B_6021
Pfam-B_1693
Pfam-B_7091
Pfam-B_667
Pfam-B_4126
Pfam-B_10527
Pfam-B_4659
Pfam-B_3557
Pfam-B_3059
Pfam-B_5197
Pfam-B_2131
Pfam-B_8443
Pfam-B_4139
Pfam-B_8157
Pfam-B_4130
Pfam-B_6217
h o m e o b o x
Pfam-B_7681
Pfam-B_10723
Pfam-B_1593
Pfam-B_5317
Pfam-B_6859Pfam-B_2630
Pfam-B_8383
Pfam-B_9682
Pfam-B_7999
Pfam-B_8397
Pfam-B_4658
Pfam-B_8076
Pfam-B_11590
Pfam-B_3222
Pfam-B_2139lamin in_G
Pfam-B_1026Pfam-B_455
Pfam-B_8205
Pfam-B_3302
Pfam-B_4237
Pfam-B_5512Pfam-B_5710
Pfam-B_844Pfam-B_4700Pfam-B_7654
Pfam-B_8204
Pfam-B_2408Pfam-B_8206
Pfam-B_6489
Pfam-B_2635
t r e f o i l
P fam-B_4044
Pfam-B_8034
Pfam-B_4128
Pfam-B_5514
Pfam-B_5750
Cys_knot
P fam-B_4945
Pfam-B_948
Pfam-B_3946
Pfam-B_814
Pfam-B_1508
Pfam-B_5709
Pfam-B_5708
Pfam-B_8912
Pfam-B_97
HSP20
Pfam-B_10703
Pfam-B_2900
Pfam-B_843
Pfam-B_309
Pfam-B_1157
Pfam-B_1068
Pfam-B_1084
Pfam-B_4045
Pfam-B_7561
Pfam-B_1684
Pfam-B_10722Pfam-B_8379
Pfam-B_3249
Pfam-B_1800Pfam-B_5085Pfam-B_3942Pfam-B_8005
Pfam-B_9264
Pfam-B_1469
Pfam-B_7560 Pfam-B_1668
Pfam-B_1614
Pfam-B_9255
Pfam-B_9274
Pfam-B_8027
Pfam-B_1683
Pfam-B_11301
Pfam-B_9267
Pfam-B_6023
Pfam-B_7809
Pfam-B_2445
Pfam-B_11331
Pfam-B_2002
Pfam-B_7970
Pfam-B_9558
Pfam-B_8989
Pfam-B_5386
Pfam-B_2568
Pfam-B_1063
Pfam-B_2234
Pfam-B_338
Pfam-B_1107
Pfam-B_7545
Pfam-B_10925
Pfam-B_2483
Pfam-B_3436Pfam-B_6076
Pfam-B_3773Pfam-B_3012
Pfam-B_5129
Pfam-B_7546
Pfam-B_9936
s e r p i n
Pfam-B_9006
Pfam-B_633
Pfam-B_2314
Pfam-B_1467
Pfam-B_11717
Pfam-B_11715
Pfam-B_2494
Pfam-B_2550Pfam-B_770
Pfam-B_1620
Pfam-B_492Pfam-B_1481Pfam-B_8026Pfam-B_5734Pfam-B_5375
Pfam-B_5730Pfam-B_4028
Pfam-B_3981
Pfam-B_1829
Pfam-B_7646
Pfam-B_1960
Pfam-B_4552
Pfam-B_4550
Pfam-B_305
Pfam-B_7659
Pfam-B_7334
Pfam-B_10473
Pfam-B_4648
Pfam-B_10471
Pfam-B_675
Pfam-B_3121
Pfam-B_3014
Pfam-B_6122Pfam-B_1654
Pfam-B_7033
Pfam-B_1819Pfam-B_1818
Pfam-B_1815
Pfam-B_1817
Pfam-B_3360Pfam-B_11332
Pfam-B_7708
Pfam-B_8986
Pfam-B_1816
Pfam-B_3081
Pfam-B_11716
Pfam-B_7053
Pfam-B_8749
Pfam-B_473
Pfam-B_1621Pfam-B_11714
Pfam-B_3015
Pfam-B_1868
Pfam-B_2583
Pfam-B_554
Pfam-B_7845
Pfam-B_5163
Pfam-B_105
Pfam-B_2284
Pfam-B_7844
Pfam-B_733
Pfam-B_7849
Pfam-B_728
Pfam-B_7843
Pfam-B_9008
Pfam-B_10435Pfam-B_407
Pfam-B_4606
Pfam-B_10440
Pfam-B_7982
Pfam-B_961
Pfam-B_2285
Pfam-B_7846
Pfam-B_7174
Pfam-B_7034
Pfam-B_307
Pfam-B_308
Pfam-B_352
Pfam-B_3013
Pfam-B_334
Pfam-B_7848
Pfam-B_9007Pfam-B_336
myos in_head
Pfam-B_3853
Pfam-B_8323
Pfam-B_1445
Pfam-B_3564
Pfam-B_2989
Pfam-B_1241
Pfam-B_3854Pfam-B_4722
Pfam-B_38Pfam-B_65
Pfam-B_3703Pfam-B_10580
Pfam-B_10558Pfam-B_5153
Pfam-B_4881
Pfam-B_1234
Pfam-B_2400
Pfam-B_8322
Pfam-B_10562
Pfam-B_4664
Pfam-B_41Pfam-B_8019
Pfam-B_10617
Pfam-B_10578
Pfam-B_4665
Pfam-B_2327
r h v
Pfam-B_3263Pfam-B_1824Pfam-B_5116
Pfam-B_297Pfam-B_6004
Pfam-B_3299Pfam-B_3212Pfam-B_1677
Pfam-B_4397
Pfam-B_4666
Pfam-B_10581Pfam-B_5173Pfam-B_125
Pfam-B_11744
Pfam-B_110
Pfam-B_7233
Pfam-B_3852Pfam-B_5304
Pfam-B_9698
Pfam-B_5162
Pfam-B_1895
Pfam-B_2578
m i t o _ c a r rP fam-B_302
Pfam-B_901
Pfam-B_2420
Pfam-B_10760
Pfam-B_466
Pfam-B_3612
i l 8
Pfam-B_437
Pfam-B_4533
Pfam-B_4714
Pfam-B_2530
Pfam-B_5407
Pfam-B_4148
Pfam-B_4917
Pfam-B_8578
Pfam-B_4920
Pfam-B_4851
Pfam-B_3940Pfam-B_11927
Pfam-B_2618Pfam-B_375Pfam-B_936
Pfam-B_7322
Pfam-B_10668
Pfam-B_11643
Pfam-B_5007
Pfam-B_376
Pfam-B_7077
Pfam-B_8138Pfam-B_8194
Pfam-B_8193
Pfam-B_5398
Pfam-B_3169
Pfam-B_711
Pfam-B_4075
Pfam-B_3960Pfam-B_4085
Pfam-B_1436
Pfam-B_2949
Pfam-B_4051
Pfam-B_3192Pfam-B_5397
Pfam-B_1774 Pfam-B_4057Pfam-B_7951
Pfam-B_5857Pfam-B_8603 Pfam-B_3708
Pfam-B_7745
Pfam-B_225
Pfam-B_6257
Pfam-B_135
Pfam-B_735
Pfam-B_3897
Pfam-B_10758
Pfam-B_2421
Pfam-B_2729
Pfam-B_303
Pfam-B_1373
Pfam-B_1967
Pfam-B_7677
Pfam-B_2907
Pfam-B_10752Pfam-B_2419
Pfam-B_16
Pfam-B_5986Pfam-B_2906
Pfam-B_2026
Pfam-B_6586
Pfam-B_1047
Pfam-B_5533
Pfam-B_4Pfam-B_6588
Pfam-B_646
Pfam-B_10751
Pfam-B_4713Pfam-B_9
Pfam-B_9506
Pfam-B_233
Pfam-B_267
Pfam-B_22
Pfam-B_7155
Pfam-B_6957
Pfam-B_102
Pfam-B_7049
Pfam-B_9191
Pfam-B_2840
Pfam-B_4059
Pfam-B_6308
Pfam-B_1821
Pfam-B_9309
Pfam-B_10748
Pfam-B_3824
Pfam-B_467
Pfam-B_167
Pfam-B_10884
Pfam-B_4060Pfam-B_8196
Pfam-B_10021
Pfam-B_4712
Pfam-B_3080
Pfam-B_7635
Pfam-B_6909
Pfam-B_11199
Pfam-B_11198
Pfam-B_11009
Pfam-B_3102Pfam-B_11026
n o t c h
Pfam-B_2386Pfam-B_3124
Pfam-B_8143
Pfam-B_1525Pfam-B_93
Pfam-B_11000 Pfam-B_7442
Pfam-B_2716
Pfam-B_275
Pfam-B_9609
Pfam-B_436
Pfam-B_7558
Pfam-B_2575Pfam-B_523
l i p o c a l i n
Pfam-B_1466
Pfam-B_5421
Pfam-B_2747
Pfam-B_2519
Pfam-B_8120Pfam-B_8825
Pfam-B_4244
Pfam-B_4058
Pfam-B_1283
Pfam-B_3082
Pfam-B_8139
Pfam-B_7952
Pfam-B_8153
Pfam-B_548
Pfam-B_1807
Pfam-B_8197
Pfam-B_1136
Pfam-B_8230
Pfam-B_381
Pfam-B_2241
Pfam-B_8257
Pfam-B_1979Pfam-B_4607Pfam-B_4420
Pfam-B_747
Pfam-B_11241
Pfam-B_3706
d s r m
Pfam-B_11753
i g
Pfam-B_7392
Pfam-B_3916
Pfam-B_546
Pfam-B_1331
Pfam-B_229
Pfam-B_2981Pfam-B_520Pfam-B_11794
Pfam-B_3713
Pfam-B_6943
Pfam-B_11017Pfam-B_2983
Pfam-B_2982
Pfam-B_9684Pfam-B_1143
Pfam-B_2777
Pfam-B_10702
Pfam-B_10946
Pfam-B_10707
Pfam-B_8372
Pfam-B_5510
Pfam-B_3017
Pfam-B_7981
Pfam-B_2570
Pfam-B_3978Pfam-B_11324
Pfam-B_7329
Pfam-B_5070
Pfam-B_11771
Pfam-B_2515
Pfam-B_5591
Pfam-B_1883
Pfam-B_5073
Pfam-B_7375
Pfam-B_7798
Pfam-B_8808
Pfam-B_9904
Pfam-B_9846
Pfam-B_880
Pfam-B_3758
Pfam-B_909
RuBisCO_smal l
P fam-B_3574Pfam-B_915
Pfam-B_2150
Pfam-B_4074
Pfam-B_6098
Pfam-B_606
Pfam-B_2028
Pfam-B_390 Pfam-B_7717Pfam-B_7718
Pfam-B_7653
Pfam-B_3816
Pfam-B_3935
Pfam-B_4701
EGF
Pfam-B_2963
Pfam-B_7719
Pfam-B_8944
Pfam-B_2980
Pfam-B_5316
Pfam-B_2643Pfam-B_9407
Pfam-B_9437
s u s h i
P fam-B_11735
Pfam-B_4924
Pfam-B_11830
c y c l i n
Pfam-B_1025
Pfam-B_4076
Pfam-B_9667
Pfam-B_4976
Pfam-B_4939
Pfam-B_1589
Pfam-B_7009
Pfam-B_5808
Pfam-B_8717
Pfam-B_7311
Pfam-B_5041
Pfam-B_4104Pfam-B_9356
Pfam-B_9983
Pfam-B_9968
Pfam-B_9980
Pfam-B_3570
Pfam-B_6544Pfam-B_4547
Pfam-B_3933
Pfam-B_3197
Pfam-B_2126STphospha tase
P fam-B_1846
Pfam-B_7642
Pfam-B_11098
Pfam-B_3196
Pfam-B_1123
Pfam-B_8801
Pfam-B_9561
Pfam-B_10374
Pfam-B_6428
Pfam-B_2797
Pfam-B_2008
A A A
Pfam-B_1726
Pfam-B_1622
Pfam-B_4474Pfam-B_9527
Pfam-B_11859
Pfam-B_1308
Pfam-B_4424
Pfam-B_1680Pfam-B_3424
Pfam-B_363
Pfam-B_2723
Pfam-B_4314
Pfam-B_199
Pfam-B_255
cy toch rome_c
Pfam-B_6384Pfam-B_4212
Pfam-B_1660
Pfam-B_1474
Pfam-B_4213
Pfam-B_364
Pfam-B_3341
Pfam-B_758
Pfam-B_9638
Pfam-B_9637
Pfam-B_1165
Pfam-B_1663
Pfam-B_2848
Pfam-B_6451
Pfam-B_8734
Pfam-B_1477
Pfam-B_10303
Pfam-B_4277
Pfam-B_166
Pfam-B_1985
Pfam-B_7411
Pfam-B_1448
Pfam-B_4134
Pfam-B_5187
Pfam-B_8676
Pfam-B_7119
Pfam-B_9041
Pfam-B_6040
Pfam-B_1368
Pfam-B_9044Pfam-B_5959
Pfam-B_5794
Pfam-B_8516
Pfam-B_6485Pfam-B_7118Pfam-B_9180
Pfam-B_11833
Pfam-B_11775Pfam-B_8594
Pfam-B_6965
Pfam-B_7897
Pfam-B_964
Pfam-B_6963
Pfam-B_9436
Pfam-B_8515
E1-E2_ATPase
Pfam-B_10105Pfam-B_1592
Pfam-B_9649
Pfam-B_5335
Pfam-B_4482
Pfam-B_460Pfam-B_459
Pfam-B_10130
Pfam-B_697
Pfam-B_92
Pfam-B_6291
Pfam-B_698
Pfam-B_10142
Pfam-B_10144
Pfam-B_4555
Pfam-B_2361
Pfam-B_575
Pfam-B_6419 Pfam-B_801
Pfam-B_332
Pfam-B_6146
Pfam-B_150
Pfam-B_3714
Pfam-B_1990
Pfam-B_3971
Pfam-B_3972
Pfam-B_11192
Pfam-B_4618
Pfam-B_3943
Pfam-B_10814
Pfam-B_11130
Pfam-B_6423
Pfam-B_6535
Pfam-B_1316
Pfam-B_6622
Pfam-B_4721
Pfam-B_2034
Pfam-B_10145Pfam-B_11151
Pfam-B_6513
Pfam-B_8910
Pfam-B_2324
Pfam-B_9838
Pfam-B_32
Pfam-B_2790
Pfam-B_9138
Pfam-B_9139
Pfam-B_968
Pfam-B_4632
Pfam-B_4894
Pfam-B_11100Pfam-B_5087
Pfam-B_10719Pfam-B_8466
Pfam-B_7997Pfam-B_1144
Pfam-B_5199Pfam-B_5109Pfam-B_8524
lamin in_B
laminin_EGF
Pfam-B_5964
p o u
Pfam-B_3109 Pfam-B_8380
Pfam-B_3461Pfam-B_220
Pfam-B_1628
Pfam-B_3223
Pfam-B_4140
Pfam-B_5045Pfam-B_3995
Pfam-B_2188
Pfam-B_2425
Pfam-B_2694
Pfam-B_804
Pfam-B_4240Pfam-B_748
Pfam-B_5718
Pfam-B_3002
f ib r inogen_C
Pfam-B_5714
Pfam-B_5793
Pfam-B_8544
Pfam-B_1644
Pfam-B_8960Pfam-B_3303
Pfam-B_5583
Pfam-B_2684
Pfam-B_8763
Pfam-B_1289Pfam-B_10813
Pfam-B_8547
Pfam-B_159
Pfam-B_1220
Pfam-B_6578
Pfam-B_2792
Pfam-B_993Pfam-B_58
Pfam-B_3837
Pfam-B_7350
Pfam-B_10727
Pfam-B_10371
Pfam-B_1354
Pfam-B_1079
Pfam-B_3528Pfam-B_10373
Pfam-B_6426
Pfam-B_10370
Pfam-B_283
adh_z incPfam-B_9973
Pfam-B_9984
Pfam-B_10956Pfam-B_9972
Pfam-B_11160
Pfam-B_10419
Pfam-B_11159
Pfam-B_9971
Pfam-B_2601
Pfam-B_10955
Pfam-B_2812
Pfam-B_40
Pfam-B_2600
Pfam-B_4093
Pfam-B_5435
Pfam-B_3200Pfam-B_4036
Pfam-B_4083
Pfam-B_1440
Pfam-B_8224
Pfam-B_4084
c o p p e r - b i n d
Pfam-B_2581
Pfam-B_7873Pfam-B_9590
Pfam-B_5334
Pfam-B_2157
Pfam-B_10965
Pfam-B_6144
Pfam-B_1532
Pfam-B_2362
Pfam-B_7300Pfam-B_7301
Pfam-B_5313
Pfam-B_2434Pfam-B_2
Pfam-B_7923
Pfam-B_11763
Pfam-B_7017Pfam-B_3063Pfam-B_11500
Pfam-B_11762Pfam-B_3075Pfam-B_1473Pfam-B_11918
Pfam-B_1041Pfam-B_7880
Pfam-B_7994
Pfam-B_2887
Pfam-B_2977
Pfam-B_1826
Pfam-B_8057
Pfam-B_8095
Pfam-B_3153
Pfam-B_5360
Pfam-B_1433
Pfam-B_483
Pfam-B_626
Pfam-B_2211
Pfam-B_4142
Pfam-B_4492Pfam-B_6180
Pfam-B_1509
h o r m o n e
Pfam-B_2339
Pfam-B_5383
Pfam-B_2801
Pfam-B_11112
Pfam-B_8445
Pfam-B_5385
Pfam-B_6534
Pfam-B_6532
Pfam-B_3902Pfam-B_5973
Pfam-B_1291
Pfam-B_2995
Pfam-B_8096
l a m i n i n _ N t e r m Pfam-B_2608
Pfam-B_5361
Pfam-B_9192
Pfam-B_677
Pfam-B_8891
Pfam-B_8548
Pfam-B_8546
Pfam-B_5703
ce l l u l ase
Pfam-B_4255
Pfam-B_1650
Pfam-B_1908
Pfam-B_684
Pfam-B_1051
Pfam-B_8545
Pfam-B_1299
Pfam-B_3322
Pfam-B_5758
Pfam-B_3324
Pfam-B_750
Pfam-B_8927
Pfam-B_5768
Pfam-B_5669
Pfam-B_5584
Pfam-B_1301
Pfam-B_914Pfam-B_6424
Pfam-B_1728
Pfam-B_5747 Pfam-B_5759 Pfam-B_89Pfam-B_8919
Pfam-B_3529Pfam-B_2383
Pfam-B_4592Pfam-B_4593
Pfam-B_10289
Pfam-B_6415Pfam-B_10288
Pfam-B_5815Pfam-B_10290Pfam-B_5687Pfam-B_5814
Pfam-B_4591
Pfam-B_5679
Pfam-B_4224
Pfam-B_1685
Pfam-B_5812
Pfam-B_1659
Pfam-B_4619
Pfam-B_10358
Pfam-B_348Pfam-B_10625
Pfam-B_6617
Pfam-B_1658
Pfam-B_9935
Pfam-B_2181Pfam-B_7664
Pfam-B_2826 Pfam-B_2622
Pfam-B_6195
Pfam-B_370
Pfam-B_2342
Pfam-B_3463
Pfam-B_10147Pfam-B_4497
Pfam-B_10678
Pfam-B_9932Pfam-B_1694
Pfam-B_8684
Pfam-B_402
Pfam-B_1339
Pfam-B_4931Pfam-B_474
Pfam-B_7299
Pfam-B_5740
Pfam-B_350
Pfam-B_10259
Pfam-B_4494
Pfam-B_7425
Pfam-B_2035
Pfam-B_7971
Pfam-B_4191Pfam-B_1823
Pfam-B_3901
Pfam-B_3133
Pfam-B_2994
Pfam-B_5293
Pfam-B_7504
Pfam-B_893Pfam-B_1197
Pfam-B_2310
p i l i nP fam-B_1110
Pfam-B_7505
Pfam-B_671
Pfam-B_9931
Pfam-B_9933
TGF-be ta
Pfam-B_2834Pfam-B_136
Pfam-B_3883
Pfam-B_3495
Pfam-B_1695
Pfam-B_9662Pfam-B_10294
Pfam-B_10299
Pfam-B_1349
Pfam-B_10947
Pfam-B_5348
Pfam-B_3516
Pfam-B_2377
Pfam-B_11021
Pfam-B_6380
Pfam-B_7235
Pfam-B_830
Pfam-B_20
Pfam-B_1182
Pfam-B_10246
Pfam-B_6332
Pfam-B_6340
Pfam-B_7196
Pfam-B_13
Pfam-B_123
Pfam-B_109
Pfam-B_595
Pfam-B_3508
Pfam-B_2844
Pfam-B_7197
Pfam-B_10241
Pfam-B_138
Pfam-B_395
Pfam-B_2842Pfam-B_1204
Pfam-B_6338
Pfam-B_8105
Pfam-B_2516
Pfam-B_6341
Pfam-B_422
Pfam-B_3848
Pfam-B_10244
Pfam-B_10833
Pfam-B_424
Pfam-B_767
Pfam-B_8108
Pfam-B_9767
Pfam-B_2173
Pfam-B_5411
Pfam-B_954
s u g a r _ t r
P fam-B_2172
Pfam-B_10297
DAG_PE-bind
Pfam-B_7665
Pfam-B_1871
Pfam-B_701
Pfam-B_2773
Pfam-B_10298
Pfam-B_37
Pfam-B_2158Pfam-B_3074
Pfam-B_6287
Pfam-B_913
Pfam-B_9661
Pfam-B_2846
Pfam-B_4594
Pfam-B_10852
Pfam-B_2333Pfam-B_2784
Pfam-B_2789
Pfam-B_6339Pfam-B_1281
Pfam-B_118
Pfam-B_8313
Pfam-B_288
Pfam-B_718
Pfam-B_573
Pfam-B_1755Pfam-B_423
Pfam-B_189
Pfam-B_6619
Pfam-B_11001
Pfam-B_10904Pfam-B_11280
Pfam-B_10799
Pfam-B_5133
Pfam-B_5131
Pfam-B_8869
Pfam-B_3715Pfam-B_178
Pfam-B_1782
Pfam-B_1240
Pfam-B_196
Pfam-B_270
Pfam-B_6650
Pfam-B_1457
Pfam-B_794
Pfam-B_6342
Pfam-B_10243
Pfam-B_11487
Pfam-B_7384
Pfam-B_11123
Pfam-B_10239
Pfam-B_10478
Pfam-B_597
Pfam-B_3530
Pfam-B_2125
Pfam-B_1498
Pfam-B_4652
Pfam-B_4653
Pfam-B_552
Pfam-B_1357
Pfam-B_1574
Pfam-B_4646
Pfam-B_11256
Pfam-B_359
Pfam-B_4645
Pfam-B_2841Pfam-B_505
Pfam-B_11490
Pfam-B_3507
Pfam-B_1521
Pfam-B_3505
Pfam-B_1758
Pfam-B_10237
Pfam-B_2488
Pfam-B_7667
Pfam-B_727
Pfam-B_4749
Pfam-B_174 Pfam-B_1510
Pfam-B_1706
Pfam-B_827
Pfam-B_826Pfam-B_4997
Pfam-B_7830
Pfam-B_1731
Pfam-B_2245Pfam-B_5810
Pfam-B_889
Pfam-B_5809
Pfam-B_2659
Pfam-B_11593Pfam-B_11900
r r m
Pfam-B_1125
Pfam-B_4350
S H 2
Pfam-B_7069
Pfam-B_3953
Pfam-B_6595
Pfam-B_4149
Pfam-B_3587
Pfam-B_2518
Pfam-B_2775
Pfam-B_7595
Pfam-B_8161
Pfam-B_6949
Pfam-B_4061
Pfam-B_2701
ld l_recept_a
Pfam-B_403
Pfam-B_1612
Pfam-B_796
Pfam-B_3618Pfam-B_3289
Pfam-B_10357
Pfam-B_7008
Pfam-B_590c a d h e r i n
Pfam-B_10993
Pfam-B_6417
Pfam-B_1365
Pfam-B_4921
Pfam-B_4922
Pfam-B_4532Pfam-B_2479
Pfam-B_6667 Pfam-B_9331
Pfam-B_4689
Pfam-B_1556
Pfam-B_6847
Pfam-B_1602
Pfam-B_10082
Pfam-B_11104
Pfam-B_6671
Pfam-B_3310
Pfam-B_7713
Pfam-B_2409
Pfam-B_3256 Pfam-B_1112
Pfam-B_1931Pfam-B_3095
Pfam-B_4688
Pfam-B_3611
Pfam-B_9919
Pfam-B_4684
Pfam-B_2504
Pfam-B_4918
Pfam-B_3586 Pfam-B_5168
Pfam-B_8090
Pfam-B_8089
Pfam-B_3689
Pfam-B_7022
Pfam-B_4847
Pfam-B_341
Pfam-B_8759
Pfam-B_1388
Pfam-B_990
Pfam-B_4164
Pfam-B_5820
Pfam-B_6548
Pfam-B_7616
Pfam-B_313
Pfam-B_5128
Pfam-B_5175
Pfam-B_7111
Pfam-B_9916
Pfam-B_2140
Pfam-B_7282
Pfam-B_7321
Pfam-B_11644
Pfam-B_11419
Pfam-B_3952
Pfam-B_11625
Pfam-B_11584
7 t m _ 1
Pfam-B_4150
Pfam-B_7712Pfam-B_4258
Pfam-B_438Pfam-B_865
Pfam-B_989
Pfam-B_1788
Pfam-B_2703
Pfam-B_130
Pfam-B_4225
Pfam-B_7078
Pfam-B_885
Pfam-B_725
Pfam-B_9918
Pfam-B_3288
Pfam-B_5141
Pfam-B_2547Pfam-B_5766
Pfam-B_8581
Pfam-B_3939
Pfam-B_3962Pfam-B_3944
Pfam-B_8580Pfam-B_3963
Pfam-B_1265
Pfam-B_5186Pfam-B_940
Pfam-B_5077
Pfam-B_975Pfam-B_887
Pfam-B_764
Pfam-B_870
Pfam-B_7372
Pfam-B_2978
Pfam-B_9439
Pfam-B_4724
Pfam-B_10270
Pfam-B_1830
Pfam-B_1831
Pfam-B_8103
Pfam-B_2063
Pfam-B_773
Pfam-B_2463
Pfam-B_3037
Pfam-B_335
Pfam-B_3633
Pfam-B_4243
Pfam-B_5182
Pfam-B_5428
Pfam-B_6378
Pfam-B_1705
Pfam-B_1302
Pfam-B_1106
Pfam-B_3767Pfam-B_1075
h e m o p e x i n
f i l a m e n t
P fam-B_11038
Pfam-B_1067
Pfam-B_1351
Pfam-B_969
Pfam-B_1089
Pfam-B_30
Pfam-B_160
Pfam-B_4483
Pfam-B_3540
Pfam-B_367
Pfam-B_2555
Pfam-B_607
Pfam-B_7668
Pfam-B_7756
Pfam-B_1613Pfam-B_2262
Pfam-B_3165
Pfam-B_9750Pfam-B_10099
Pfam-B_5079Pfam-B_3050
Pfam-B_201
Pfam-B_5069Pfam-B_7374Pfam-B_9052
Pfam-B_2228Pfam-B_7373
Pfam-B_11591
Pfam-B_9431
Pfam-B_888
Pfam-B_2696
Pfam-B_4451
Pfam-B_8890
Pfam-B_1600
Pfam-B_5143
Pfam-B_8758
Pfam-B_9130
Pfam-B_7647Pfam-B_7749
Pfam-B_3315 Pfam-B_2546
Pfam-B_7052
Pfam-B_3442
Pfam-B_7648
Pfam-B_1601
Pfam-B_10010
Pfam-B_2802
Pfam-B_2675
Pfam-B_2697
Pfam-B_9911Pfam-B_5656
Pfam-B_2223
Pfam-B_6565
Pfam-B_1965
Pfam-B_6484
Pfam-B_7117
Pfam-B_2715
Pfam-B_1961
Pfam-B_9046
Pfam-B_813
Pfam-B_330
Pfam-B_6721
z f -C2H2
Pfam-B_3173
Pfam-B_8954
Pfam-B_5413
Pfam-B_368
Pfam-B_8373
Pfam-B_7453
Pfam-B_5198
Pfam-B_3237
Pfam-B_8409
HTH_1
Pfam-B_2278
Pfam-B_415
Pfam-B_3314
Pfam-B_9650
Pfam-B_9182
Pfam-B_9043
Pfam-B_3741
Pfam-B_6964Pfam-B_5571
Pfam-B_5048
Pfam-B_8163
Pfam-B_7669
Pfam-B_8955
Pfam-B_2416
Pfam-B_7751Pfam-B_8447
Pfam-B_5541
Pfam-B_6916Pfam-B_8523
Pfam-B_9276
Pfam-B_5341
Pfam-B_4129
Pfam-B_9943
Pfam-B_9140
Pfam-B_6294
Pfam-B_5842
Pfam-B_3446Pfam-B_3543
Pfam-B_3342 Pfam-B_5841
Pfam-B_1516
Pfam-B_1971
Pfam-B_3494
Pfam-B_2120
Pfam-B_4489Pfam-B_4898
Pfam-B_10143
Pfam-B_10131
Pfam-B_4556
Pfam-B_1970
ATP-synt_C
Pfam-B_11279
Pfam-B_11046Pfam-B_1893
Pfam-B_4696Pfam-B_9045
Pfam-B_4418
Pfam-B_2305
e f h a n d
Pfam-B_9648
Pfam-B_11014
Pfam-B_3877Pfam-B_7462
Pfam-B_9047
Pfam-B_9042
Pfam-B_10726
Pfam-B_3593
Pfam-B_831Pfam-B_6248
Pfam-B_2759
Pfam-B_1489
Pfam-B_1490
Pfam-B_28
Pfam-B_5742
Pfam-B_11766
Pfam-B_1134
Pfam-B_219Pfam-B_2827
Pfam-B_4709
Pfam-B_4806
Pfam-B_8371
Pfam-B_3994
Pfam-B_7056
Pfam-B_1822
Pfam-B_3097
Pfam-B_6858
Pfam-B_9683
Pfam-B_1414
Pfam-B_8460
Pfam-B_8370
Pfam-B_8376
Pfam-B_3898
Pfam-B_2225
Pfam-B_4077
Pfam-B_3187
Pfam-B_4538
Pfam-B_5754
Pfam-B_7557
Pfam-B_4062
Pfam-B_1941
Pfam-B_7680
Pfam-B_5340
Pfam-B_5339
Pfam-B_7252
Pfam-B_3492
P H
Pfam-B_9605
Pfam-B_5540
Pfam-B_6414Pfam-B_4610
Pfam-B_974
Pfam-B_8145
Pfam-B_3882
Pfam-B_6915
Pfam-B_6216
Pfam-B_5084
Pfam-B_4269
Pfam-B_358
Pfam-B_2304
Pfam-B_7640
Pfam-B_5196Pfam-B_7686
Pfam-B_6845
Pfam-B_4127
Pfam-B_83
Pfam-B_5278
Pfam-B_8527
Pfam-B_1656
Pfam-B_8918
Pfam-B_1465
Pfam-B_1840
Pfam-B_5404 Pfam-B_477
Pfam-B_10217
Pfam-B_4072
Pfam-B_8583
Pfam-B_2204
Pfam-B_8017
Pfam-B_3509Pfam-B_3072
Pfam-B_6600
Pfam-B_3580
p k i n a s e
Pfam-B_699
Pfam-B_11809
Pfam-B_3257 Pfam-B_5909Pfam-B_8227
Pfam-B_8760Pfam-B_863
Pfam-B_11323Pfam-B_2839
Pfam-B_6944
Pfam-B_700
Pfam-B_5154
Pfam-B_1113
Pfam-B_2541
Pfam-B_11808
Pfam-B_461
Figure 5.12. The Family network using only non-circular patterns
Red nodes are nodes found in the non-CP network. Green nodes are nodes found only in theCP-network. Blue edges denote edges found in the CP network. Pink edges are edges foundonly in the CP network.
CHAPTER 5. CIRCULAR PATTERN MATCHING 132
Pfam-B_918
Pfam-B_1502
Pfam-B_141
Pfam-B_8094
Pfam-B_993
Pfam-B_5264
Pfam-B_1223
Pfam-B_2635
Pfam-B_6489
Pfam-B_4174
Pfam-B_10708
Pfam-B_10703
Pfam-B_309
Pfam-B_10711
Pfam-B_919
Pfam-B_831
Pfam-B_7376
Y_phosphatase
P fam-B_5075
Pfam-B_10718
Pfam-B_10566
Pfam-B_7378
Pfam-B_5960
Pfam-B_5078
Pfam-B_705
Pfam-B_3842
Pfam-B_844
lec t in_c
Pfam-B_8076
Pfam-B_3249Pfam-B_8379
Pfam-B_10397
Pfam-B_37
Pfam-B_3547
Pfam-B_6287
Pfam-B_2846
Pfam-B_604Pfam-B_11278
Pfam-B_4686
Pfam-B_7136
Pfam-B_4698
Pfam-B_10692
Pfam-B_6576
Pfam-B_10698
Pfam-B_2245
Pfam-B_1088
Pfam-B_5810
Pfam-B_7668
Pfam-B_1649
Pfam-B_9692
Pfam-B_7565
Pfam-B_8332
Pfam-B_514Pfam-B_3767
Pfam-B_1022
Pfam-B_121
Pfam-B_2149
Pfam-B_2139
Pfam-B_5197
Pfam-B_8466
Pfam-B_11100
Pfam-B_7751Pfam-B_2288
Pfam-B_4659
Pfam-B_7453
Pfam-B_368Pfam-B_10527
Pfam-B_4126
Pfam-B_7999
Pfam-B_4269
Pfam-B_8447
Pfam-B_9683
Pfam-B_4128
Pfam-B_3557
Pfam-B_3059
Pfam-B_667
Pfam-B_3097
Pfam-B_1822
Pfam-B_8157Pfam-B_5541
Pfam-B_4130
h o m e o b o x
Pfam-B_11833
Pfam-B_1922
Pfam-B_8370
Pfam-B_5266
Pfam-B_964
Pfam-B_7354
Pfam-B_9180
p o u
Pfam-B_5265
Pfam-B_6965
Pfam-B_8380
Pfam-B_3223
Pfam-B_7091
Pfam-B_9606
Pfam-B_5085
Pfam-B_842
Pfam-B_7667
Pfam-B_174
Pfam-B_5809
Pfam-B_6619
Pfam-B_6572
Pfam-B_7665
Pfam-B_701
Pfam-B_671Pfam-B_3883Pfam-B_5348
Pfam-B_20
Pfam-B_10298Pfam-B_2310
Pfam-B_2773
Pfam-B_10299
Pfam-B_58
Pfam-B_3074
Pfam-B_359
Pfam-B_2377
Pfam-B_913
Pfam-B_9661
Pfam-B_4594
Pfam-B_2340
Pfam-B_1835
Pfam-B_6374
Pfam-B_7115
Pfam-B_42
Pfam-B_6303
Pfam-B_292
Pfam-B_4008
Pfam-B_284
Pfam-B_331
Pfam-B_910
Pfam-B_8116
Pfam-B_49
Pfam-B_955
Pfam-B_2880
Pfam-B_530
Pfam-B_3572
Pfam-B_1152
Pfam-B_10585
Pfam-B_3069
Pfam-B_9215Pfam-B_10563
Pfam-B_2884
Pfam-B_1186
Pfam-B_10562
Pfam-B_3575
Pfam-B_7489
Pfam-B_7488Pfam-B_5653
Pfam-B_4675
Pfam-B_2676
Pfam-B_3758
Pfam-B_7846
Pfam-B_7174
Pfam-B_7848
Pfam-B_7843
Pfam-B_308
s e r p i n
Pfam-B_352
Pfam-B_9418
Pfam-B_6641
Pfam-B_4943
Pfam-B_3771Pfam-B_6398
Pfam-B_10403
Pfam-B_10573
Pfam-B_5323
Pfam-B_8141
Pfam-B_2506
Pfam-B_1774
Pfam-B_4206
Pfam-B_8724
Pfam-B_2892
Pfam-B_630
Pfam-B_2494
Pfam-B_3015
Pfam-B_7845
Pfam-B_425
Pfam-B_2710
Pfam-B_337
Pfam-B_4631
Pfam-B_1830
Pfam-B_1831
Pfam-B_2516
Pfam-B_8105
Pfam-B_7932 HTH_1
Pfam-B_3474
Pfam-B_594
Pfam-B_10001
Pfam-B_11256
Pfam-B_10294Pfam-B_7235
Pfam-B_7505
Pfam-B_768
Pfam-B_8089
Pfam-B_5176
Pfam-B_3013
Pfam-B_1621
Pfam-B_10297
Pfam-B_150
Pfam-B_830
Pfam-B_2158
Pfam-B_1182
Pfam-B_6380
p i l i n
P fam-B_11021
Pfam-B_5136
Pfam-B_10351
Pfam-B_7809
Pfam-B_10572
Pfam-B_6006
Pfam-B_8138
Pfam-B_1026
Pfam-B_1459
Pfam-B_752Pfam-B_534
Pfam-B_3663
Pfam-B_3169
Pfam-B_5604
Pfam-B_11743
Pfam-B_1713
Pfam-B_8142
Pfam-B_3707
Pfam-B_8824
Pfam-B_5799
Pfam-B_7635
Pfam-B_1617
Pfam-B_5741
Pfam-B_1525
Pfam-B_8975
Pfam-B_7951
Pfam-B_102
Pfam-B_4058
Pfam-B_2840d s r m
Pfam-B_1255Pfam-B_7392
Pfam-B_5857
Pfam-B_4807i g
Pfam-B_5757
Pfam-B_4252
Pfam-B_3071
Pfam-B_2604
Pfam-B_2575
Pfam-B_11009
Pfam-B_11037
Pfam-B_8982
Pfam-B_479
Pfam-B_864
Pfam-B_7727
Pfam-B_3817
Pfam-B_8003
Pfam-B_1618
Pfam-B_3969
Pfam-B_2241
Pfam-B_2009
Pfam-B_2568Pfam-B_3879
Pfam-B_6018
Pfam-B_1817
Pfam-B_1815
Pfam-B_2963
Pfam-B_6017
Pfam-B_7479
Pfam-B_1361
Pfam-B_400 Pfam-B_10607
Pfam-B_10604
Pfam-B_85
Pfam-B_81
Pfam-B_10600
Pfam-B_2529
Pfam-B_7480
Pfam-B_555
Pfam-B_2015Pfam-B_496
Pfam-B_8207
Pfam-B_2855
Pfam-B_3899
Pfam-B_11198
Pfam-B_7559
Pfam-B_774
Pfam-B_2297
Pfam-B_2296
Pfam-B_3068
Pfam-B_881
Pfam-B_6003
Pfam-B_4082
Pfam-B_9533
Pfam-B_2290
Pfam-B_4403
Pfam-B_135
Pfam-B_5421
Pfam-B_8195Pfam-B_5908
Pfam-B_7451
Pfam-B_7826
Pfam-B_2540
Pfam-B_5113
Pfam-B_5114
Pfam-B_495
r n a s e H
Pfam-B_6146
Pfam-B_4482
Pfam-B_332
Pfam-B_9139
Pfam-B_330
Pfam-B_590
Pfam-B_3543
Pfam-B_9943
Pfam-B_6294
c a d h e r i n
Pfam-B_9138
Pfam-B_8910Pfam-B_813
Pfam-B_2790
Pfam-B_5842
Pfam-B_3342
Pfam-B_1941Pfam-B_4632
Pfam-B_9140
Pfam-B_5841
Pfam-B_968Pfam-B_3446
Pfam-B_1189
Pfam-B_10144Pfam-B_92
Pfam-B_5335Pfam-B_3494
Pfam-B_10143Pfam-B_10131
Pfam-B_7659
Pfam-B_7053
Pfam-B_10142
E1-E2_ATPase
Pfam-B_1516
Pfam-B_9007
Pfam-B_210
Pfam-B_7182
Pfam-B_10170
Pfam-B_697Pfam-B_1970
Pfam-B_1971
Pfam-B_459
Pfam-B_3482
Pfam-B_3294
Pfam-B_4535
Pfam-B_9838
Pfam-B_4536
Pfam-B_8302
P H
Pfam-B_4556
Pfam-B_4555
Pfam-B_460
Pfam-B_10130
Pfam-B_2361
Pfam-B_698
Pfam-B_6291
Pfam-B_2120
ATP-synt_C
laminin_EGF
Pfam-B_5435
Pfam-B_4083
Pfam-B_3109
Pfam-B_5360
Pfam-B_8096
Pfam-B_2608
Pfam-B_3972
Pfam-B_5385
l a m i n i n _ N t e r m
Pfam-B_10555
Pfam-B_7577
Pfam-B_4548
Pfam-B_8718
Pfam-B_2466
p 4 5 0
Pfam-B_7383
Pfam-B_1237
Pfam-B_641
Pfam-B_2112
Pfam-B_9859
Pfam-B_402
Pfam-B_1694
Pfam-B_1339
Pfam-B_7068
Pfam-B_2742
Pfam-B_1569
Pfam-B_370
Pfam-B_9934
Pfam-B_1695
Pfam-B_9933
Pfam-B_4497
Pfam-B_8684
Pfam-B_6195
Pfam-B_136
Pfam-B_2834
Pfam-B_8471
Pfam-B_5590
Pfam-B_3495
Pfam-B_2191
Pfam-B_9383
Pfam-B_597Pfam-B_10368
Pfam-B_11164
Pfam-B_1079
Pfam-B_3528
Pfam-B_2789
Pfam-B_10852
Pfam-B_10373Pfam-B_6426
Pfam-B_10370
Pfam-B_283
Pfam-B_1676
Pfam-B_222Pfam-B_3117
Pfam-B_723
Pfam-B_388
Pfam-B_3738
Pfam-B_3881 Pfam-B_3404
Pfam-B_7363Pfam-B_7615
Pfam-B_2430
Pfam-B_779
Pfam-B_6627
Pfam-B_4374
Pfam-B_3931
Pfam-B_4634
Pfam-B_1780
ox ido red_mo lyb
P fam-B_7775
Pfam-B_5552
Pfam-B_2192
Pfam-B_4820
Pfam-B_6788
Pfam-B_3843
Pfam-B_977
Pfam-B_1406
Pfam-B_5651
Pfam-B_951 Pfam-B_3286
Pfam-B_3287
Pfam-B_169
Pfam-B_277
Pfam-B_4332
Pfam-B_4909
Pfam-B_7051
Pfam-B_1461
Pfam-B_5124
Pfam-B_6042
Pfam-B_791
Pfam-B_5138
Pfam-B_7904
Pfam-B_3573Pfam-B_7773
p y r _ r e d o xPfam-B_1127
Pfam-B_7627Pfam-B_3350
heme_1 Pfam-B_9778 Pfam-B_2418
Pfam-B_7366Pfam-B_2837
Pfam-B_6959
Pfam-B_2673
Pfam-B_4194Pfam-B_3330
Pfam-B_10820
Pfam-B_9421
Pfam-B_2674
Pfam-B_1744
Pfam-B_6452 Pfam-B_9423Pfam-B_256 Pfam-B_895
Pfam-B_3613
Pfam-B_2973Pfam-B_5789 Pfam-B_3787
v w d
Pfam-B_1017 Pfam-B_5239 Pfam-B_7624 Pfam-B_1794 Pfam-B_1583
Pfam-B_6988Pfam-B_11075
Pfam-B_11897
Pfam-B_11092tRNA-syn t_2
Pfam-B_2923
Pfam-B_5650
Pfam-B_2306
Pfam-B_6047Pfam-B_9635
Pfam-B_39Pfam-B_197
Pfam-B_6057Pfam-B_1001
Pfam-B_4185
Pfam-B_5934Pfam-B_7416
Pfam-B_9211
Pfam-B_2989
Pfam-B_3120Pfam-B_683
Pfam-B_8252
Pfam-B_2209
Pfam-B_264
Pfam-B_3368Pfam-B_366
Pfam-B_10506
Pfam-B_5923 a lpha -amy lase
P fam-B_9288 Pfam-B_6993 Pfam-B_9467
Pfam-B_4492Pfam-B_4384
Pfam-B_1919
Pfam-B_5970
Pfam-B_3594
Pfam-B_8445
Pfam-B_6581
Pfam-B_7362Pfam-B_10710 Pfam-B_3558
Pfam-B_1518Pfam-B_170
Pfam-B_3839Pfam-B_10190Pfam-B_1812
Pfam-B_556Pfam-B_868
Pfam-B_207
Pfam-B_7625
Pfam-B_3822 Pfam-B_244Pfam-B_7365
Pfam-B_172
Pfam-B_5746
Pfam-B_848
Pfam-B_1332
Pfam-B_6046
c p n 6 0
Pfam-B_383Pfam-B_3920 Pfam-B_1371
Pfam-B_920
Pfam-B_8711
Pfam-B_4707
Pfam-B_4884
Pfam-B_9321
Pfam-B_4352
Pfam-B_7598Pfam-B_5551Pfam-B_5553
Pfam-B_2690Pfam-B_1061Pfam-B_451
Pfam-B_9537Pfam-B_1175
Pfam-B_2293Pfam-B_7583 Pfam-B_223
Pfam-B_7584
Pfam-B_7586
Pfam-B_3450 Pfam-B_2793
Pfam-B_9804
Pfam-B_2556Pfam-B_7725
Pfam-B_7585
Pfam-B_5662Pfam-B_10073
Pfam-B_2557Pfam-B_5663
Pfam-B_631
Pfam-B_2910 Pfam-B_2354
Pfam-B_2137
Pfam-B_2029
Pfam-B_5231
Pfam-B_4331
Pfam-B_9251Pfam-B_2411
Pfam-B_692Pfam-B_4454
Pfam-B_4124
Pfam-B_4685
Pfam-B_3576
Pfam-B_8111
Pfam-B_7122
Pfam-B_8112
Pfam-B_1666
Pfam-B_4995
Pfam-B_11898
Pfam-B_2613
Pfam-B_8719
Pfam-B_5125
Pfam-B_8754
Pfam-B_8753
Pfam-B_1462
Pfam-B_5156
Pfam-B_8716
Pfam-B_8133
Pfam-B_1153
Pfam-B_8180
Pfam-B_276
Pfam-B_8418
Pfam-B_3076
Pfam-B_8755
Pfam-B_2807
Pfam-B_224
Pfam-B_4193
Pfam-B_8672Pfam-B_672
Pfam-B_1034
Pfam-B_2186Pfam-B_4123
s i g m a 7 0
Pfam-B_4501
Pfam-B_542
Pfam-B_950
Pfam-B_543
Pfam-B_4327
Pfam-B_1549
Pfam-B_6773
Pfam-B_4789
Pfam-B_69
Pfam-B_949
Pfam-B_925
Pfam-B_8467
Pfam-B_577Pfam-B_3211
Pfam-B_203
Pfam-B_8346
Pfam-B_8109
Pfam-B_2599
Pfam-B_8067
Pfam-B_1523Pfam-B_4328
Pfam-B_9237
Pfam-B_2512Pfam-B_10319
Pfam-B_9317
Pfam-B_8056
Pfam-B_1962
Pfam-B_238
Pfam-B_10037
fe r4_N i fH
Pfam-B_10036
Pfam-B_536
Pfam-B_2650Pfam-B_1632
Pfam-B_5836
Pfam-B_8451Pfam-B_9342
Pfam-B_8605
Pfam-B_10079
Pfam-B_2356
Pfam-B_34
Pfam-B_2645
Pfam-B_59
Pfam-B_628 Pfam-B_4849
Pfam-B_8453
Pfam-B_10546
Pfam-B_5783Pfam-B_246Pfam-B_579
Pfam-B_1513Pfam-B_4530 Pfam-B_128
Pfam-B_2222
Pfam-B_51
Pfam-B_2476
Pfam-B_11331
Pfam-B_10318
Pfam-B_2229
Pfam-B_2850
Pfam-B_1287
Pfam-B_645
Pfam-B_5274
Pfam-B_4848
Pfam-B_155pro_ isomerase P fam-B_9962 Pfam-B_9259 Pfam-B_8787Pfam-B_897
Pfam-B_5155Pfam-B_5218Pfam-B_7724Pfam-B_5267Pfam-B_7355Pfam-B_5300Pfam-B_2154 Pfam-B_8937 Pfam-B_6604Pfam-B_9588 Pfam-B_4749Pfam-B_498 Pfam-B_73Pfam-B_4362Pfam-B_4394Pfam-B_4435Pfam-B_6060Pfam-B_4616Pfam-B_4721Pfam-B_6617 Pfam-B_3581Pfam-B_507 Pfam-B_6013
Pfam-B_7658 Pfam-B_685 Pfam-B_5515Pfam-B_5806 Pfam-B_5696Pfam-B_5808 Pfam-B_4663Pfam-B_4759Pfam-B_490Pfam-B_4911Pfam-B_5059Pfam-B_5147Pfam-B_5167Pfam-B_5227 Pfam-B_462
Pfam-B_8435Pfam-B_4140Pfam-B_4119Pfam-B_4120Pfam-B_7535Pfam-B_3873Pfam-B_3871Pfam-B_3872Pfam-B_4324 Pfam-B_1054Pfam-B_3260Pfam-B_3606Pfam-B_6065Pfam-B_3430
Pfam-B_3385Pfam-B_3440Pfam-B_45Pfam-B_4512Pfam-B_4528 Pfam-B_4413 Pfam-B_3132Pfam-B_3864Pfam-B_3993 Pfam-B_3849
Pfam-B_7608
Pfam-B_7601
Pfam-B_6475
Pfam-B_2725
Pfam-B_4310Pfam-B_2724
Pfam-B_7605
Pfam-B_413
Pfam-B_4313
Pfam-B_552Pfam-B_1357
Pfam-B_4653
Pfam-B_2252
Pfam-B_3344 DNA_pol
P fam-B_10479 Pfam-B_10476
Pfam-B_10816
Pfam-B_1745
Pfam-B_2429
Pfam-B_846
Pfam-B_922
Pfam-B_3549
Pfam-B_271
Pfam-B_397
Pfam-B_8623
Pfam-B_10677
Pfam-B_7762
s u b t i l a s e
Pfam-B_4285
Pfam-B_7755
Pfam-B_3107
Pfam-B_622
Pfam-B_583
Pfam-B_318
Pfam-B_5974
Pfam-B_4143
Pfam-B_469
Pfam-B_5972
Pfam-B_10821Pfam-B_719
Pfam-B_10825
Pfam-B_3041
Pfam-B_809
Pfam-B_8670
Pfam-B_720Pfam-B_10823
Pfam-B_7619Pfam-B_4364Pfam-B_9212
Pfam-B_9358Pfam-B_5620Pfam-B_1312
Pfam-B_5077
Pfam-B_5079
Pfam-B_9897
Pfam-B_2659
Pfam-B_3887
Pfam-B_4451
Pfam-B_11592
Pfam-B_888
Pfam-B_11591
Pfam-B_11593
Pfam-B_2675Pfam-B_2697
Pfam-B_9052
Pfam-B_2228
Pfam-B_9904
Pfam-B_2912
Pfam-B_2223Pfam-B_6594
Pfam-B_2913
S H 3
Pfam-B_10099
Pfam-B_9431
S H 2Pfam-B_1883
Pfam-B_403
Pfam-B_1720Pfam-B_2696Pfam-B_2802
Pfam-B_5656 Pfam-B_7052
Pfam-B_9911
Pfam-B_796
Pfam-B_2060
Pfam-B_8518
Pfam-B_878
Pfam-B_1436
Pfam-B_9609
Pfam-B_5154
Pfam-B_8407
Pfam-B_3706
Pfam-B_1956Pfam-B_8228
Pfam-B_4607
Pfam-B_381Pfam-B_4244
Pfam-B_7456
Pfam-B_5101
Pfam-B_3080
Pfam-B_1283
Pfam-B_4045
Pfam-B_4700
Pfam-B_2900
Pfam-B_9750
Pfam-B_4044Pfam-B_843
Pfam-B_7069
EGF
Pfam-B_2416
Pfam-B_10721
Pfam-B_10720Pfam-B_6578
Pfam-B_5591
Pfam-B_5041
Pfam-B_3837
Pfam-B_7311
Pfam-B_4074
Pfam-B_7349
Pfam-B_7830
Pfam-B_870
Pfam-B_6216
Pfam-B_8397
Pfam-B_1414
Pfam-B_7997
Pfam-B_8524
Pfam-B_6915
Pfam-B_7640
Pfam-B_8954Pfam-B_6845
Pfam-B_5045
Pfam-B_3995
Pfam-B_4057
Pfam-B_7558
Pfam-B_3187
Pfam-B_4139
Pfam-B_8005
Pfam-B_2541
Pfam-B_1465
Pfam-B_11809
Pfam-B_1134
Pfam-B_5397
Pfam-B_5405
r a s
Pfam-B_4202
Pfam-B_108
Pfam-B_8409
Pfam-B_2777
Pfam-B_10350
Pfam-B_3302
Pfam-B_7491
Pfam-B_5368
Pfam-B_1840
Pfam-B_8153
Pfam-B_3898
Pfam-B_8583
Pfam-B_4077
Pfam-B_7557
Pfam-B_2225
Pfam-B_3429
Pfam-B_83
Pfam-B_6308
Pfam-B_8944
Pfam-B_275
a n k
Pfam-B_4060
Pfam-B_3713
Pfam-B_8825
Pfam-B_735
Pfam-B_3124
Pfam-B_9191
Pfam-B_8139
Pfam-B_219
Pfam-B_11199
Pfam-B_2716
Pfam-B_1821
Pfam-B_3082
Pfam-B_2827
Pfam-B_9667
Pfam-B_3257
Pfam-B_4538
Pfam-B_8196
Pfam-B_5754
Pfam-B_4062
Pfam-B_9309
Pfam-B_4059
Pfam-B_7049
Pfam-B_6595
Pfam-B_6414
Pfam-B_3730
Pfam-B_4726
Pfam-B_7009
Pfam-B_7008Pfam-B_7010
Pfam-B_11900Pfam-B_3618
Pfam-B_3289
Pfam-B_3580
Pfam-B_2016
Pfam-B_6721
Pfam-B_8527
Pfam-B_1979
RIP
Pfam-B_1612Pfam-B_1238
Pfam-B_11735
Pfam-B_4051
Pfam-B_8161Pfam-B_699
Pfam-B_93
Pfam-B_3102
Pfam-B_4420
Pfam-B_4072
Pfam-B_7595
Pfam-B_4061
Pfam-B_8143
Pfam-B_974
Pfam-B_8918Pfam-B_5404
Pfam-B_2701
Pfam-B_10702
Pfam-B_8017
Pfam-B_5278Pfam-B_8170
Pfam-B_10357
Pfam-B_3170
Pfam-B_3916
Pfam-B_7952
Pfam-B_8145
Pfam-B_10946
Pfam-B_5742
Pfam-B_11794
Pfam-B_8257
Pfam-B_8120
Pfam-B_28
c y c l i n
P fam-B_10021
Pfam-B_2982
Pfam-B_11766Pfam-B_2122
Pfam-B_2980
Pfam-B_11323
Pfam-B_2161
Pfam-B_523
Pfam-B_2519
Pfam-B_5909
Pfam-B_7442
Pfam-B_2983
Pfam-B_11017
Pfam-B_8175
Pfam-B_6909
Pfam-B_8230Pfam-B_8194
Pfam-B_5406
Pfam-B_3960
Pfam-B_6943
n o t c h
Pfam-B_8197
Pfam-B_2061
Pfam-B_2518
Pfam-B_3072Pfam-B_1016
Pfam-B_4085
Pfam-B_2204
Pfam-B_4806
Pfam-B_3882
Pfam-B_1025
Pfam-B_4076
Pfam-B_8193
Pfam-B_11753
Pfam-B_2839
Pfam-B_2598
p k i n a s e
Pfam-B_6600
Pfam-B_477
Pfam-B_4075
Pfam-B_3192
Pfam-B_11808
Pfam-B_1136
Pfam-B_2949
Pfam-B_5398
Pfam-B_1656
Pfam-B_436
Pfam-B_2981Pfam-B_11000
Pfam-B_6944
Pfam-B_11026
Pfam-B_5607
Pfam-B_7926
Pfam-B_711
Pfam-B_2747
Pfam-B_2878
Pfam-B_7849Pfam-B_10473
Pfam-B_7034
Pfam-B_334
Pfam-B_10575
Pfam-B_7033
Pfam-B_2758
Pfam-B_10576 Pfam-B_2669
Pfam-B_6417
Pfam-B_472Pfam-B_8273
Pfam-B_620
Pfam-B_2274
Pfam-B_9008
Pfam-B_520
Pfam-B_473
Pfam-B_7981
Pfam-B_156
Pfam-B_2550
Pfam-B_1654
Pfam-B_105
Pfam-B_7334
Pfam-B_3369
Pfam-B_6957
Pfam-B_30Pfam-B_7816
Pfam-B_1467
Pfam-B_9006
Pfam-B_2314
Pfam-B_1620
Pfam-B_851
Pfam-B_3540Pfam-B_3081
Pfam-B_4006
Pfam-B_6098
Pfam-B_307
Pfam-B_554
Pfam-B_2583
Pfam-B_10440
Pfam-B_3121
Pfam-B_733
Pfam-B_10435
Pfam-B_336
Pfam-B_7982
myos in_head
Pfam-B_7844
Pfam-B_407
Pfam-B_2570
Pfam-B_728
Pfam-B_1640
Pfam-B_939
Pfam-B_7719
Pfam-B_5386
Pfam-B_7718
Pfam-B_7717Pfam-B_5634
Pfam-B_3880
Pfam-B_2028
Pfam-B_3193
Pfam-B_9903
Pfam-B_874
Pfam-B_606
Pfam-B_9901
Pfam-B_4237
Pfam-B_5710
Pfam-B_1818Pfam-B_1816
Pfam-B_3978
Pfam-B_1819
Pfam-B_297
Pfam-B_3299
Pfam-B_3263
Pfam-B_1489Pfam-B_4398
Pfam-B_1677
Pfam-B_7486
Pfam-B_659
Pfam-B_4397
Pfam-B_6801
Pfam-B_6000
Pfam-B_4395
Pfam-B_1490
Pfam-B_10556
Pfam-B_6004
Pfam-B_6802
Pfam-B_9531
Pfam-B_2289
Pfam-B_11467
r h v
Pfam-B_2945
Pfam-B_3212
Pfam-B_4396
Pfam-B_2759
Pfam-B_1824
Pfam-B_2292
Pfam-B_2298
Pfam-B_5116
r v tP fam-B_2530
Pfam-B_1260
Pfam-B_9548
Pfam-B_1802
Pfam-B_2531
Pfam-B_2299
Pfam-B_2013
Pfam-B_6511
Pfam-B_9494
Pfam-B_2403
Pfam-B_1804
Pfam-B_10597
Pfam-B_5111Pfam-B_6012Pfam-B_10574
Pfam-B_9543
Pfam-B_2405
Pfam-B_7483
Pfam-B_6062
Pfam-B_4563
Pfam-B_616
Pfam-B_7252
Pfam-B_2386
Pfam-B_548
Pfam-B_461
Pfam-B_10217Pfam-B_1113
Pfam-B_3017
Pfam-B_4939
Pfam-B_3509
Pfam-B_3816
Pfam-B_546
Pfam-B_7155 Pfam-B_4606
Pfam-B_547
Pfam-B_229
Pfam-B_134r v p
Pfam-B_5115
Pfam-B_6002
Pfam-B_3878
Pfam-B_497
Pfam-B_10594
Pfam-B_19
Pfam-B_1803
Pfam-B_10243
Pfam-B_1706
Pfam-B_6342
Pfam-B_2841Pfam-B_3508
Pfam-B_3505
Pfam-B_10245
Pfam-B_4279
Pfam-B_123
Pfam-B_189
Pfam-B_7197
Pfam-B_827
Pfam-B_1204
Pfam-B_505
Pfam-B_766 Pfam-B_395Pfam-B_109Pfam-B_9827
Pfam-B_5797
Pfam-B_9303Pfam-B_7771
Pfam-B_5376
Pfam-B_9365
Pfam-B_6330
Pfam-B_13
Pfam-B_10242
Pfam-B_10246
Pfam-B_7196Pfam-B_422
Pfam-B_3507
Pfam-B_11490
Pfam-B_2844
Pfam-B_765
Pfam-B_10237
Pfam-B_6340
Pfam-B_826
Pfam-B_1758
Pfam-B_11487
Pfam-B_595
Pfam-B_4997
Pfam-B_138
Pfam-B_6332
Pfam-B_1521
Pfam-B_5643Pfam-B_4382
Pfam-B_7769
Pfam-B_1913
Pfam-B_7770
Pfam-B_676
Pfam-B_387Pfam-B_5157
Pfam-B_7603
Pfam-B_2124
Pfam-B_7606
Pfam-B_4311
Pfam-B_9168
Pfam-B_7600
Pfam-B_862
Pfam-B_6403 Pfam-B_3917 Pfam-B_900Pfam-B_6733Pfam-B_278Pfam-B_361
Pfam-B_4339
Pfam-B_4703
Pfam-B_1420
Pfam-B_3603Pfam-B_7246
Pfam-B_1575Pfam-B_1882 Pfam-B_1213
Pfam-B_3321
Pfam-B_262
a l d e d hPfam-B_10281
Pfam-B_2373
Pfam-B_1212
Pfam-B_1399
Pfam-B_360Pfam-B_1111
Pfam-B_2986
Pfam-B_5652
Pfam-B_7248
tRNA-syn t_1
Pfam-B_10371
Pfam-B_2125
Pfam-B_3530
Pfam-B_2333
Pfam-B_1354Pfam-B_2784
Pfam-B_118
Pfam-B_9162
Pfam-B_491
Pfam-B_1498
Pfam-B_4652
Pfam-B_9158
Pfam-B_408
Pfam-B_1997
Pfam-B_4453
Pfam-B_2755
Pfam-B_10477
Pfam-B_10509Pfam-B_427Pfam-B_1421
Pfam-B_6248
Pfam-B_5847
Pfam-B_288
Pfam-B_1167Pfam-B_4312
Pfam-B_5846
Pfam-B_9159Pfam-B_414Pfam-B_3345
Pfam-B_10478Pfam-B_7041
Pfam-B_7501
Pfam-B_1946
Pfam-B_2953Pfam-B_10510
Pfam-B_1904
Pfam-B_10512Pfam-B_6875
Pfam-B_7602
Pfam-B_7607
Zn_c lus
Pfam-B_3063
Pfam-B_6659h o r m o n e
Pfam-B_7425
Pfam-B_2425
Pfam-B_11763
Pfam-B_6336
Pfam-B_1532
Pfam-B_6532
Pfam-B_474
Pfam-B_3902
Pfam-B_5973
Pfam-B_1291
Pfam-B_7301
Pfam-B_5313
Pfam-B_6144
Pfam-B_2
Pfam-B_2157Pfam-B_7880
Pfam-B_2994
Pfam-B_3133
Pfam-B_2362
Pfam-B_7300
Pfam-B_3901
Pfam-B_2887
Pfam-B_7017
Pfam-B_6534
Pfam-B_3075
Pfam-B_2977Pfam-B_3458
Pfam-B_7299
Pfam-B_9590
Pfam-B_11762
Pfam-B_911
Pfam-B_10965
Pfam-B_3560
Pfam-B_11918
Pfam-B_6128
Pfam-B_4472
Pfam-B_2956
Pfam-B_50
Pfam-B_2581
c o p p e r - b i n d Pfam-B_2434
Pfam-B_1823
Pfam-B_2211Pfam-B_7994
Pfam-B_2034
Pfam-B_1725
Pfam-B_235
Pfam-B_117
Pfam-B_560
Pfam-B_6846
Pfam-B_6492
Pfam-B_6863
Pfam-B_234
Pfam-B_405
Pfam-B_2286
Pfam-B_5122
Pfam-B_5123
Pfam-B_2685
Pfam-B_2533
Pfam-B_3186
Pfam-B_7503
Pfam-B_1318
Pfam-B_959
Pfam-B_3184
Pfam-B_2565
Pfam-B_5647
Pfam-B_3183Pfam-B_11422
Pfam-B_4518Pfam-B_457
Pfam-B_1317
Pfam-B_3354
Pfam-B_10091
Pfam-B_1963Pfam-B_2896Pfam-B_513
Pfam-B_1949
Pfam-B_10090
Pfam-B_1699
Pfam-B_557
Pfam-B_602 Pfam-B_3355Pfam-B_10086
Pfam-B_10092 Pfam-B_4537Pfam-B_2359
Pfam-B_10087
Pfam-B_3483
Pfam-B_351
Pfam-B_1964Pfam-B_1071Pfam-B_2894 Pfam-B_2895
Pfam-B_6231
Pfam-B_9991
Pfam-B_9992Pfam-B_3469
Pfam-B_902
Pfam-B_6233
Pfam-B_2815Pfam-B_6605
Pfam-B_4984
Pfam-B_111
Pfam-B_2816
Pfam-B_9993Pfam-B_9346
Pfam-B_6353
Pfam-B_3512
Pfam-B_1982
Pfam-B_6355Pfam-B_3511
Pfam-B_10793
Pfam-B_6599
Pfam-B_6598
Pfam-B_976
Pfam-B_11806
Pfam-B_11805
Pfam-B_5216Pfam-B_3949
Pfam-B_206
Pfam-B_180
Pfam-B_3950
Pfam-B_7722
Pfam-B_3951
Pfam-B_7723
Pfam-B_179
Pfam-B_5215
Pfam-B_10633
Pfam-B_6354
Pfam-B_3510
Pfam-B_2374
Pfam-B_11500
Pfam-B_1359
Pfam-B_2069
Pfam-B_1473
Pfam-B_2375
Pfam-B_6352
Pfam-B_67
Pfam-B_9811
Pfam-B_6129
Pfam-B_5945
Pfam-B_4191
Pfam-B_249
Pfam-B_1348
Pfam-B_4209
Pfam-B_9932
Pfam-B_5971
Pfam-B_10678
Pfam-B_9931
Pfam-B_10147
Pfam-B_9382
Pfam-B_2342
Pfam-B_3463
Pfam-B_5171
Pfam-B_5217
Pfam-B_3948
Pfam-B_343
Pfam-B_2737
Pfam-B_163
Pfam-B_2596
Pfam-B_5400
Pfam-B_7795
Pfam-B_4354
Pfam-B_164
Pfam-B_4157
Pfam-B_11350
Pfam-B_10785
Pfam-B_10391
Pfam-B_7721
Pfam-B_9227
Pfam-B_5334
Pfam-B_4885
Pfam-B_10789
Pfam-B_3853Pfam-B_3164
Pfam-B_2927Pfam-B_1469Pfam-B_10253
Pfam-B_8989
Pfam-B_6174
Pfam-B_3412
Pfam-B_140 Pfam-B_2002Pfam-B_4028 Pfam-B_5734
Pfam-B_8987
Pfam-B_3411 Pfam-B_9565
Pfam-B_4722
Pfam-B_6946Pfam-B_11143
Pfam-B_612Pfam-B_8
Pfam-B_2443
Pfam-B_3721
Pfam-B_1828Pfam-B_1614
Pfam-B_7108Pfam-B_5375Pfam-B_7560
Pfam-B_1481Pfam-B_8026
Pfam-B_9264
Pfam-B_9274
Pfam-B_1960
Pfam-B_1683
lec t in_ legB
Pfam-B_1996
Pfam-B_492
Pfam-B_5730
Pfam-B_8027
Pfam-B_3724
Pfam-B_35
Pfam-B_3149
Pfam-B_3154
Pfam-B_11841 Pfam-B_70
Pfam-B_4887
Pfam-B_4407
Pfam-B_2928
Pfam-B_5377
Pfam-B_2926
Pfam-B_1562Pfam-B_2071
Pfam-B_5387
response_reg
Pfam-B_718
Pfam-B_2842Pfam-B_423
Pfam-B_3852
Pfam-B_6339Pfam-B_5162
Pfam-B_6610
Pfam-B_9698
Pfam-B_11860
Pfam-B_3854
Pfam-B_6086
Pfam-B_4506
Pfam-B_6608Pfam-B_820
Pfam-B_9706
Pfam-B_6347
Pfam-B_1829 Pfam-B_9255
Pfam-B_1668Pfam-B_8988
Pfam-B_3427Pfam-B_131
Pfam-B_139
Pfam-B_11301
Pfam-B_9267
Pfam-B_2524
Pfam-B_613
Pfam-B_99
Pfam-B_3162
Pfam-B_5361Pfam-B_8057
Pfam-B_5383
Pfam-B_3153
Pfam-B_8224
Pfam-B_4093
Pfam-B_575
Pfam-B_4084
Pfam-B_55 Pfam-B_3971
Pfam-B_4617
Pfam-B_6419Pfam-B_10469Pfam-B_984
Pfam-B_191
Pfam-B_10463Pfam-B_10465
Pfam-B_10460
Pfam-B_10464
Pfam-B_703
Pfam-B_464
Pfam-B_214
Pfam-B_165
Pfam-B_839Pfam-B_838
Pfam-B_10637Pfam-B_10640
Pfam-B_56
Pfam-B_10157
Pfam-B_430
Pfam-B_10167
Pfam-B_6537
Pfam-B_10166
Pfam-B_6298
Pfam-B_1200Pfam-B_419
Pfam-B_10164
Pfam-B_10169
Pfam-B_1972
Pfam-B_9230
Pfam-B_958
Pfam-B_9231
Pfam-B_4155
Pfam-B_2017
Pfam-B_988
Pfam-B_6542
Pfam-B_6601Pfam-B_916
Pfam-B_3087Pfam-B_769 Pfam-B_9327
Pfam-B_3090
Pfam-B_5399
Pfam-B_9347
Pfam-B_10168
Pfam-B_6849
Pfam-B_4560
Pfam-B_2365
Pfam-B_418
Pfam-B_5926
Pfam-B_503
Pfam-B_10163
Pfam-B_7736
Pfam-B_3956
Pfam-B_4561
Pfam-B_333
ox ido red_ fad
Pfam-B_5225
Pfam-B_7796
Pfam-B_10159
Pfam-B_1199
Pfam-B_10466 Pfam-B_213
Pfam-B_510
Pfam-B_1356
Pfam-B_2870
Pfam-B_1724
Pfam-B_509
Pfam-B_2431
Pfam-B_1877
Pfam-B_4286
Pfam-B_5671
Pfam-B_7834
Pfam-B_2664Pfam-B_7098
Pfam-B_994 Pfam-B_2428Pfam-B_3016
Pfam-B_5006
Pfam-B_379
Pfam-B_8662
Pfam-B_10688
Pfam-B_2748
7 t m _ 2Pfam-B_1487Pfam-B_6778Pfam-B_4902
Pfam-B_2858
Pfam-B_3737Pfam-B_1296
Pfam-B_5627
Pfam-B_8742
Pfam-B_8688
Pfam-B_124
Pfam-B_712
Pfam-B_11600
Pfam-B_190
Pfam-B_1003
Pfam-B_2456
Pfam-B_2959
Pfam-B_5782
Pfam-B_254 Pfam-B_3457
Pfam-B_9830
Pfam-B_6450Pfam-B_561
Pfam-B_240
Pfam-B_1337
Pfam-B_2629
Pfam-B_694
Pfam-B_2426Pfam-B_627Pfam-B_9832
Pfam-B_9833Pfam-B_651
Pfam-B_10644Pfam-B_6349
a c t i n
Pfam-B_9344
Pfam-B_9831
Cys -p ro tease
Pfam-B_2427
Pfam-B_2738
Pfam-B_1981
Pfam-B_198
Pfam-B_9601
Pfam-B_4271
Pfam-B_247
Pfam-B_2547
Pfam-B_4547
Pfam-B_2126
Pfam-B_1622
STphospha tase
P fam-B_1124
Pfam-B_7642
Pfam-B_10270
Pfam-B_3848
Pfam-B_794
Pfam-B_8928
Pfam-B_424
Pfam-B_767
Pfam-B_7384
Pfam-B_5760
Pfam-B_1206
Pfam-B_506
Pfam-B_11642
Pfam-B_8030
Pfam-B_4786
Pfam-B_8932
Pfam-B_6333
Pfam-B_816
Pfam-B_5246
Pfam-B_3455
Pfam-B_11715
Pfam-B_5000
Pfam-B_10226
Pfam-B_6334
Pfam-B_6819
Pfam-B_6337
Pfam-B_3670
Pfam-B_6335
Pfam-B_7195
Pfam-B_5069
Pfam-B_7374
Pfam-B_201
Pfam-B_9848
Pfam-B_7372
Pfam-B_2978
Pfam-B_5073
Pfam-B_2111
Pfam-B_9846
Pfam-B_1600
Pfam-B_312
Pfam-B_4684
Pfam-B_7712
Pfam-B_4847
Pfam-B_437
Pfam-B_3953
Pfam-B_1112
Pfam-B_7750
Pfam-B_4225
Pfam-B_936Pfam-B_7322
Pfam-B_1601
Pfam-B_2703
Pfam-B_623
Pfam-B_9919Pfam-B_2409
Pfam-B_3952
Pfam-B_375
Pfam-B_11638
Pfam-B_4851
Pfam-B_11643Pfam-B_5766
Pfam-B_3310
Pfam-B_6580
Pfam-B_159
Pfam-B_1220
Pfam-B_2792
Pfam-B_10727
Pfam-B_969
Pfam-B_1351
Pfam-B_1089
Pfam-B_160
Pfam-B_3593
Pfam-B_372
Pfam-B_1788
Pfam-B_7321
Pfam-B_11104
Pfam-B_9130
Pfam-B_1388
Pfam-B_2058
Pfam-B_1602
Pfam-B_5142
Pfam-B_7647Pfam-B_7107
Pfam-B_7022
Pfam-B_2515Pfam-B_3048
Pfam-B_5070
Pfam-B_5072
Pfam-B_11771
Pfam-B_3165
Pfam-B_773Pfam-B_3050
Pfam-B_1323
Pfam-B_1056
Pfam-B_1556
Pfam-B_2504
Pfam-B_3315Pfam-B_7526
Pfam-B_5143Pfam-B_4110
Pfam-B_7713
Pfam-B_3256
Pfam-B_5141
Pfam-B_885
Pfam-B_5168Pfam-B_3095
Pfam-B_725
7 t m _ 1
Pfam-B_10543
Pfam-B_6209
Pfam-B_5096
Pfam-B_691
Pfam-B_341
Pfam-B_10726
f e r 2
Pfam-B_2068
Pfam-B_932
Pfam-B_456
Pfam-B_11159
Pfam-B_6903
Pfam-B_9973
Pfam-B_4410
Pfam-B_9982Pfam-B_9983
Pfam-B_9980
Pfam-B_6544
Pfam-B_194
Pfam-B_6227
Pfam-B_9971
Pfam-B_1665Pfam-B_7573
Pfam-B_989Pfam-B_3295
Pfam-B_9129
Pfam-B_5144
Pfam-B_5820
Pfam-B_1931
Pfam-B_4533
Pfam-B_2823
Pfam-B_10756
Pfam-B_6671
Pfam-B_9861
Pfam-B_11160
Pfam-B_3467
Pfam-B_11029Pfam-B_10419
Pfam-B_4148
Pfam-B_4356
Pfam-B_11927
Pfam-B_9310
Pfam-B_4149
Pfam-B_4918
Pfam-B_10807
Pfam-B_3373
Pfam-B_4357
Pfam-B_4920
Pfam-B_6529
Pfam-B_10741
Pfam-B_11141Pfam-B_3416Pfam-B_9225
Pfam-B_3108
Pfam-B_11545
Pfam-B_3091
Pfam-B_11320
w n t
P fam-B_11716
Pfam-B_305
Pfam-B_4767Pfam-B_5539
Pfam-B_5538
Pfam-B_3645
Pfam-B_11714
Pfam-B_966
Pfam-B_1558
Pfam-B_106
Pfam-B_186
Pfam-B_970
Pfam-B_1953
Pfam-B_9936
Pfam-B_9820
Pfam-B_9819
Pfam-B_1232
Pfam-B_11649
Pfam-B_9822
Pfam-B_1233
Pfam-B_355
Pfam-B_9818
Pfam-B_9817
k e t o a c y l - s y n t
P fam-B_4873
Pfam-B_11511
Pfam-B_177
Pfam-B_3672
Pfam-B_9821
Pfam-B_53
Pfam-B_591
Pfam-B_126
Pfam-B_10508
Pfam-B_6862
Pfam-B_927
Pfam-B_517
Pfam-B_6822
Pfam-B_1761
Pfam-B_5537
Pfam-B_3045
Pfam-B_7357
Pfam-B_7359
Pfam-B_11795Pfam-B_11796
Pfam-B_11892Pfam-B_4829
Pfam-B_36
Pfam-B_17
Pfam-B_52
Pfam-B_24
Pfam-B_287
Pfam-B_133
Pfam-B_2988
Pfam-B_6736
Pfam-B_1123
Pfam-B_10668
Pfam-B_3611
Pfam-B_6674
Pfam-B_5234
Pfam-B_2546
Pfam-B_5184Pfam-B_2619
Pfam-B_3586
Pfam-B_8578
Pfam-B_5175
Pfam-B_4688
Pfam-B_4532Pfam-B_4922
Pfam-B_2992Pfam-B_4921
Pfam-B_4919
Pfam-B_8090
Pfam-B_2479
Pfam-B_4350
Pfam-B_9331
Pfam-B_4689
Pfam-B_4917
Pfam-B_6183
Pfam-B_940
Pfam-B_8581
Pfam-B_8580
Pfam-B_5186
Pfam-B_1265
Pfam-B_11419
Pfam-B_3945
Pfam-B_3939
Pfam-B_2764
Pfam-B_2140
Pfam-B_6184Pfam-B_7749
Pfam-B_5185
Pfam-B_8696
Pfam-B_1365
Pfam-B_438Pfam-B_3963
Pfam-B_9916
Pfam-B_3962
Pfam-B_3940
Pfam-B_2618
t h i o r e d
Pfam-B_3055
Pfam-B_8108
Pfam-B_3944
Pfam-B_1158
Pfam-B_1890
t o x i n
Pfam-B_10082
Pfam-B_7616
Pfam-B_4258
Pfam-B_11584
Pfam-B_865
Pfam-B_4150
Pfam-B_3587
Pfam-B_8758
ld l_recept_a
Pfam-B_6847
Pfam-B_6923
Pfam-B_6547
Pfam-B_11644
Pfam-B_7648
Pfam-B_7282Pfam-B_7111
i l 8
Pfam-B_2243 Pfam-B_1943 Pfam-B_11177
Pfam-B_1299
Pfam-B_10884
Pfam-B_2184Pfam-B_2199Pfam-B_2198Pfam-B_2218Pfam-B_884Pfam-B_8803Pfam-B_2227Pfam-B_2317 Pfam-B_1864Pfam-B_2175Pfam-B_3218Pfam-B_945 Pfam-B_917Pfam-B_1463 Pfam-B_1754Pfam-B_10311Pfam-B_1522Pfam-B_1545 Pfam-B_3728Pfam-B_8761Pfam-B_1338Pfam-B_6172Pfam-B_1366Pfam-B_1929Pfam-B_1367 Pfam-B_1231 Pfam-B_11894
Pfam-B_3699 Pfam-B_10530 t h i o l a s e Pfam-B_1147 Pfam-B_2806Pfam-B_9964Pfam-B_11726Pfam-B_2462 Pfam-B_9938Pfam-B_10259Pfam-B_9898 Pfam-B_10932 Pfam-B_4433Pfam-B_9657Pfam-B_1190 Pfam-B_9656Pfam-B_9410Pfam-B_9880Pfam-B_10545Pfam-B_5129Pfam-B_5220Pfam-B_5221 Pfam-B_1807
Pfam-B_3110 Pfam-B_2577Pfam-B_2985 Pfam-B_2559Pfam-B_2560Pfam-B_6596
thy rog lobu l i n_1 P fam-B_8740Pfam-B_8741Pfam-B_1121Pfam-B_2521Pfam-B_7851 Pfam-B_5258Pfam-B_5257
COeste rasePfam-B_2114Pfam-B_1407Pfam-B_1415Pfam-B_1122Pfam-B_1426Pfam-B_1292Pfam-B_1432
Pfam-B_2662 Pfam-B_2647 Pfam-B_4915 Pfam-B_1838Pfam-B_1131 Pfam-B_5133 Pfam-B_5821Pfam-B_10926
Pfam-B_10202Pfam-B_2312Pfam-B_8179Pfam-B_5803 Pfam-B_3921Pfam-B_2571 Pfam-B_9676Pfam-B_5273 Pfam-B_4175 Pfam-B_1346Pfam-B_5978Pfam-B_9470 Pfam-B_9219Pfam-B_9220 Pfam-B_217Pfam-B_2449Pfam-B_2448 Pfam-B_8208
Pfam-B_8813Pfam-B_5774
Pfam-B_9963Pfam-B_389
Pfam-B_9000 Pfam-B_8322Pfam-B_4415Pfam-B_8563Pfam-B_8564Pfam-B_9320Pfam-B_3918Pfam-B_9409 Pfam-B_5830
Pfam-B_589Pfam-B_78Pfam-B_6115Pfam-B_1334Pfam-B_7596Pfam-B_7597Pfam-B_7924Pfam-B_4004
Pfam-B_9357 Pfam-B_1115
Pfam-B_1484
Pfam-B_2507
Pfam-B_3820
Pfam-B_3821
Pfam-B_6701
Pfam-B_7593
Pfam-B_329
Pfam-B_9030
Pfam-B_3915Pfam-B_10638
Pfam-B_3579
Pfam-B_1087
Pfam-B_811
Pfam-B_857
Pfam-B_7594
Pfam-B_4720
Pfam-B_8702
Pfam-B_8703
Pfam-B_6029
Pfam-B_4719
Pfam-B_6317
Pfam-B_1178
Pfam-B_1431
Pfam-B_1130
Pfam-B_493
Pfam-B_9304
Pfam-B_2266Pfam-B_921
Pfam-B_538
Pfam-B_1320
Pfam-B_2268
Pfam-B_1272Pfam-B_1568 Pfam-B_6346
Pfam-B_9674Pfam-B_9668
Pfam-B_2082 Pfam-B_3768Pfam-B_1129
Pfam-B_1937 Pfam-B_10717
Pfam-B_8712
Pfam-B_1221 Pfam-B_97
Pfam-B_7654
Pfam-B_2023
Pfam-B_679
Pfam-B_4704
Pfam-B_4207 Pfam-B_3935
Pfam-B_7285
Pfam-B_1579
Pfam-B_4186
Pfam-B_8650
Pfam-B_983 Pfam-B_3860Pfam-B_9143
Pfam-B_173Pfam-B_4637
Pfam-B_835
Pfam-B_7405
Pfam-B_682
Pfam-B_9146
Pfam-B_5835
Pfam-B_1586
Pfam-B_3061
Pfam-B_200
Pfam-B_8845Pfam-B_3437Pfam-B_8584Pfam-B_2593Pfam-B_2595Pfam-B_6719Pfam-B_2943 Pfam-B_2224
Pfam-B_7818
Pfam-B_2415
Pfam-B_5222
Pfam-B_1098Pfam-B_4973Pfam-B_2865Pfam-B_2866Pfam-B_1847Pfam-B_183Pfam-B_321Pfam-B_2048Pfam-B_2663Pfam-B_2205 Pfam-B_11399Pfam-B_2938Pfam-B_8589
Pfam-B_8763Pfam-B_5758
Pfam-B_10363
Pfam-B_6512
h i s t o n e
Pfam-B_3364Pfam-B_4144Pfam-B_3242 Pfam-B_7782 Pfam-B_304Pfam-B_11891Pfam-B_2828
Pfam-B_2833
Pfam-B_1300Pfam-B_8840
Pfam-B_8848
Pfam-B_3070
Pfam-B_5121Pfam-B_7495
Pfam-B_3566Pfam-B_3561
Pfam-B_2398
Pfam-B_7496
Pfam-B_3847
Pfam-B_5071Pfam-B_3844
Pfam-B_3305 a d h _ s h o r t
P fam-B_4050
Pfam-B_10693
Pfam-B_4266Pfam-B_1012
Pfam-B_1726Pfam-B_1930Pfam-B_10374Pfam-B_1503Pfam-B_1092Pfam-B_5684Pfam-B_3047Pfam-B_394
Pfam-B_6571Pfam-B_10660
Pfam-B_2770
Pfam-B_10659
Pfam-B_1363
Pfam-B_6570
Pfam-B_4697
Pfam-B_1735
Pfam-B_10658
Pfam-B_5064
Pfam-B_2741 Pfam-B_1639
Pfam-B_487 Pfam-B_8880
Pfam-B_5104Pfam-B_3322Pfam-B_10Pfam-B_21Pfam-B_4022Pfam-B_5991Pfam-B_9873Pfam-B_2255
Pfam-B_11
Pfam-B_3279
Pfam-B_3301
Pfam-B_5721Pfam-B_8881
Pfam-B_5720
Pfam-B_8707Pfam-B_27
Pfam-B_14
Pfam-B_18 Pfam-B_9341
Pfam-B_6994
Pfam-B_1172
Pfam-B_1806
Pfam-B_1653
Pfam-B_6996
Pfam-B_2121
Pfam-B_10588Pfam-B_1412c o n n e x i nPfam-B_2284Pfam-B_3539Pfam-B_11321 Pfam-B_4821Pfam-B_2346 Pfam-B_8473
Pfam-B_740 Pfam-B_739Pfam-B_2101Pfam-B_7232 Pfam-B_6140Pfam-B_6256Pfam-B_2864 Pfam-B_3348Pfam-B_1342 Pfam-B_6138Pfam-B_6444
Pfam-B_11111Pfam-B_6435Pfam-B_10787Pfam-B_5896Pfam-B_1055Pfam-B_8653Pfam-B_1040Pfam-B_822
Pfam-B_6657Pfam-B_5439Pfam-B_662Pfam-B_1582Pfam-B_6577Pfam-B_8797Pfam-B_650Pfam-B_10775Pfam-B_3197Pfam-B_7134
Pfam-B_319Pfam-B_303Pfam-B_228Pfam-B_2320 Pfam-B_1967Pfam-B_962 Pfam-B_227Pfam-B_2300 Pfam-B_1814
Pfam-B_1378 Pfam-B_11324Pfam-B_3640Pfam-B_11292 Pfam-B_11264Pfam-B_5678Pfam-B_11366 HSP70 Pfam-B_11162Pfam-B_11253
Pfam-B_2321Pfam-B_4195Pfam-B_486Pfam-B_4706Pfam-B_6579
Pfam-B_647Pfam-B_3046Pfam-B_6327 Pfam-B_6202Pfam-B_7580Pfam-B_7212Pfam-B_7478Pfam-B_5118Pfam-B_9194Pfam-B_3981 Pfam-B_7581 Pfam-B_729Pfam-B_7646 Pfam-B_481 Pfam-B_9238
Pfam-B_5859Pfam-B_5738Pfam-B_5736Pfam-B_2615Pfam-B_5457Pfam-B_1263Pfam-B_5202Pfam-B_5095
Pfam-B_6021Pfam-B_9560Pfam-B_6071Pfam-B_9762Pfam-B_6089Pfam-B_9751Pfam-B_61Pfam-B_7731
A A ACys_knotP fam-B_8034Pfam-B_10120
Pfam-B_7468Pfam-B_265 Pfam-B_433Pfam-B_1047Pfam-B_1044 Pfam-B_5942 Pfam-B_8795Pfam-B_1094
Pfam-B_8071Pfam-B_2642Pfam-B_10529
Pfam-B_10795Pfam-B_1413
Pfam-B_2163Pfam-B_1416Pfam-B_860 Pfam-B_7464Pfam-B_6515 Pfam-B_8802Pfam-B_8894Pfam-B_3423Pfam-B_8545 Pfam-B_4858
Pfam-B_891 Pfam-B_7552
Pfam-B_5880Pfam-B_1507
Pfam-B_7551 Pfam-B_7465Pfam-B_9521Pfam-B_9876 Pfam-B_7329Pfam-B_750Pfam-B_8012
Pfam-B_4925Pfam-B_8127
Pfam-B_7768Pfam-B_882Pfam-B_5390
Pfam-B_5389Pfam-B_1280
s i g m a 5 4
Pfam-B_4054Pfam-B_2382
Pfam-B_3525
Pfam-B_2001
Pfam-B_5532Pfam-B_6502
Pfam-B_7975
Pfam-B_7146
Pfam-B_6323
Pfam-B_1595
Pfam-B_2838
Pfam-B_3905
Pfam-B_10218
Pfam-B_6480
Pfam-B_5558
Pfam-B_8489
Pfam-B_6478
Pfam-B_1150 Pfam-B_1418 Pfam-B_3246
Pfam-B_1596
Pfam-B_6428Pfam-B_6118Pfam-B_6045Pfam-B_5686Pfam-B_5062Pfam-B_504 Pfam-B_3405Pfam-B_3378Pfam-B_3552
Pfam-B_780
Pfam-B_3614
Pfam-B_2432Pfam-B_8065
Pfam-B_7153
Pfam-B_9091Pfam-B_482Pfam-B_4916
Pfam-B_3623Pfam-B_409Pfam-B_7001 Pfam-B_3129
Pfam-B_470
Pfam-B_9731
Pfam-B_349 Pfam-B_8587
Pfam-B_8461
Pfam-B_3244 Pfam-B_1132
Pfam-B_7553Pfam-B_793
Pfam-B_5131
Pfam-B_8585
HSP20
Pfam-B_1626Pfam-B_7653
Pfam-B_1737
Pfam-B_775 Pfam-B_4961
Pfam-B_10716
k a z a l
P fam-B_9989
Pfam-B_6362Pfam-B_9675
Pfam-B_2480
Pfam-B_4470Pfam-B_7160
Pfam-B_965
Pfam-B_5944
Pfam-B_301
Pfam-B_1278
Pfam-B_2003
Pfam-B_3903Pfam-B_11717
Pfam-B_8253
Pfam-B_362
Pfam-B_4552
Pfam-B_2803
Pfam-B_250
Pfam-B_5618
Pfam-B_8749
Pfam-B_7569
Pfam-B_6032
Pfam-B_3268
Pfam-B_2591
Pfam-B_3426
Pfam-B_9979Pfam-B_4550
Pfam-B_1450 Pfam-B_3856
Pfam-B_4511
Pfam-B_5548
Pfam-B_1090
Pfam-B_1224
Pfam-B_1180
Pfam-B_4146
Pfam-B_6173
Pfam-B_1485
Pfam-B_1509
Pfam-B_6180
Pfam-B_10902
Pfam-B_1566
Pfam-B_3031
Pfam-B_7006
Pfam-B_713
Pfam-B_586 Pfam-B_86
Pfam-B_2888
Pfam-B_4733Pfam-B_76
Pfam-B_87Pfam-B_2990Pfam-B_9282
Pfam-B_4344
tsp_1
Pfam-B_558 Pfam-B_2050 Pfam-B_2018
Pfam-B_2627
Pfam-B_3443
Pfam-B_10602
Pfam-B_2406
Pfam-B_10955 Pfam-B_4113Pfam-B_1217 Pfam-B_10609
Pfam-B_876Pfam-B_7410Pfam-B_4677Pfam-B_10706Pfam-B_609adh_z incPfam-B_6632Pfam-B_96
Pfam-B_787
Pfam-B_445Pfam-B_4710
Pfam-B_107
Pfam-B_1183
Pfam-B_1591
Pfam-B_3343
Pfam-B_1281
Pfam-B_3943
Pfam-B_1990
Pfam-B_4913
Pfam-B_10303
Pfam-B_1985
Pfam-B_2826
Pfam-B_1448
Pfam-B_9044Pfam-B_9436Pfam-B_4696
Pfam-B_6040Pfam-B_5794
Pfam-B_6485
Pfam-B_1368
Pfam-B_6484
Pfam-B_9650
Pfam-B_9041
Pfam-B_9045
Pfam-B_9043Pfam-B_7119
Pfam-B_166
Pfam-B_7117
Pfam-B_9182
Pfam-B_9047
Pfam-B_1965
Pfam-B_7669Pfam-B_3741
Pfam-B_8516
Pfam-B_7462
Pfam-B_5571
Pfam-B_5959
Pfam-B_4277
Pfam-B_6964
Pfam-B_6963
Pfam-B_8515
Pfam-B_1893
Pfam-B_5048
Pfam-B_9046
Pfam-B_2715
z f -C2H2
Pfam-B_7897
Pfam-B_7118
Pfam-B_7664
Pfam-B_9042
Pfam-B_3877Pfam-B_1961
Pfam-B_10904
Pfam-B_11280
Pfam-B_889
Pfam-B_3516
Pfam-B_6369
Pfam-B_6371
Pfam-B_9662
Pfam-B_10947
Pfam-B_1574Pfam-B_11145
Pfam-B_11155Pfam-B_10784
Pfam-B_1110
Pfam-B_982
Pfam-B_11896
Pfam-B_1832
Pfam-B_1116
ABC_tran
Pfam-B_162
Pfam-B_3428
Pfam-B_11146
Pfam-B_1203
Pfam-B_148
Pfam-B_176Pfam-B_2230Pfam-B_5089
Pfam-B_762
Pfam-B_2400
Pfam-B_6716
Pfam-B_803
Pfam-B_5173
Pfam-B_7116Pfam-B_2327
Pfam-B_730
Pfam-B_11737
Pfam-B_3988
Pfam-B_4261
Pfam-B_3137
Pfam-B_3985
Pfam-B_1820
Pfam-B_3136
Pfam-B_2148
Pfam-B_731
Pfam-B_9912
Pfam-B_3327
Pfam-B_9974
Pfam-B_2814
Pfam-B_1349
Pfam-B_2304
Pfam-B_2188
Pfam-B_5339
Pfam-B_2131
Pfam-B_6858
Pfam-B_4894
Pfam-B_5340
Pfam-B_8376
Pfam-B_7686
Pfam-B_5198
Pfam-B_1593
Pfam-B_2630
Pfam-B_10719
Pfam-B_5199
Pfam-B_9684
Pfam-B_7680
Pfam-B_8523Pfam-B_5341
Pfam-B_5413
Pfam-B_3492
Pfam-B_4945
Pfam-B_3222
Pfam-B_8443
Pfam-B_5109
Pfam-B_8371
Pfam-B_5196
Pfam-B_4127Pfam-B_4129
Pfam-B_8373
Pfam-B_8372
Pfam-B_6916
Pfam-B_7681
Pfam-B_11590
Pfam-B_1800
Pfam-B_948
Pfam-B_5512
Pfam-B_8383
Pfam-B_10722
Pfam-B_358
Pfam-B_8460
Pfam-B_4658
Pfam-B_6859
Pfam-B_4701
Pfam-B_9605
Pfam-B_9276Pfam-B_3173
Pfam-B_5084Pfam-B_5510
Pfam-B_2291
Pfam-B_814
Pfam-B_1143
Pfam-B_1144Pfam-B_3942
Pfam-B_3947
r r m
Pfam-B_8163
Pfam-B_2147
Pfam-B_5540
Pfam-B_915
Pfam-B_3574
Pfam-B_8717
Pfam-B_3237
Pfam-B_7836
Pfam-B_10589Pfam-B_60zf-CCHC
Pfam-B_674
C 2
Pfam-B_582
Pfam-B_1267Pfam-B_11906
Pfam-B_8528Pfam-B_5490
Pfam-B_5491Pfam-B_1731
Pfam-B_65Pfam-B_6523
Pfam-B_41Pfam-B_110Pfam-B_7487
Pfam-B_38
Pfam-B_5117
Pfam-B_8019
Pfam-B_3703
Pfam-B_4664
Pfam-B_10558
Pfam-B_125Pfam-B_1234
Pfam-B_5153
Pfam-B_4881
Pfam-B_11744
Pfam-B_10578
Pfam-B_10617
Pfam-B_10581
Pfam-B_4666
Pfam-B_3564
Pfam-B_10580
Pfam-B_4665
z n - p r o t e a s e
Pfam-B_7233
Pfam-B_10707
Pfam-B_700
Pfam-B_4610
Pfam-B_7209
Pfam-B_1331
Pfam-B_747
Pfam-B_1466
s u s h i
P fam-B_4295
Pfam-B_2881
H L H
Pfam-B_3122
Pfam-B_3314 Pfam-B_2572
Pfam-B_3987
Pfam-B_31
Pfam-B_75
Pfam-B_411Pfam-B_382
Pfam-B_2893
Pfam-B_5654
Pfam-B_1533
Pfam-B_1729
Pfam-B_10587
Pfam-B_4924
l i p o c a l i n
P fam-B_11830
Pfam-B_4976
Pfam-B_1068
Pfam-B_1589
Pfam-B_2420
Pfam-B_16
Pfam-B_10760
Pfam-B_466
Pfam-B_10736
Pfam-B_2906
Pfam-B_2643
Pfam-B_2421
Pfam-B_605
Pfam-B_4712
Pfam-B_5533
Pfam-B_2907
Pfam-B_2729
Pfam-B_5316
Pfam-B_8323
GTP_EFTU
Pfam-B_1739
Pfam-B_9241
Pfam-B_1373
Pfam-B_7695
Pfam-B_9
Pfam-B_267
Pfam-B_167Pfam-B_467
Pfam-B_10758
Pfam-B_4Pfam-B_6586Pfam-B_10752
Pfam-B_4713
Pfam-B_6588
Pfam-B_233
Pfam-B_2419
Pfam-B_3824
Pfam-B_646
Pfam-B_10751
Pfam-B_22
a m i n o t r a n
Pfam-B_8095
Pfam-B_1440
Pfam-B_1433
Pfam-B_8223
Pfam-B_3200
TGF-be ta
Pfam-B_8222
lamin in_B
Pfam-B_4036
Pfam-B_5589
Pfam-B_2464
Pfam-B_3062Pfam-B_6041
Pfam-B_1928
Pfam-B_5780
Pfam-B_4888
Pfam-B_6182
Pfam-B_10625
Pfam-B_6622
Pfam-B_10290Pfam-B_1659
Pfam-B_914
Pfam-B_5815
Pfam-B_6451
Pfam-B_6423
Pfam-B_199
Pfam-B_1728
Pfam-B_1685
Pfam-B_4618
Pfam-B_6415
Pfam-B_5687
Pfam-B_8891
Pfam-B_4593
Pfam-B_1658
Pfam-B_549
Pfam-B_10215
Pfam-B_10216Pfam-B_2370
Pfam-B_1520
Pfam-B_3500Pfam-B_10214
Pfam-B_706
Pfam-B_758
Pfam-B_363
Pfam-B_1680
Pfam-B_9638
Pfam-B_8919
Pfam-B_5583
Pfam-B_4255
Pfam-B_5768
Pfam-B_5669
Pfam-B_3529
Pfam-B_4592
Pfam-B_6513
Pfam-B_4591
Pfam-B_4224
Pfam-B_10289
Pfam-B_5679
Pfam-B_10358
Pfam-B_5812
Pfam-B_4619
Pfam-B_2383
Pfam-B_5814Pfam-B_10288
Pfam-B_6424
Pfam-B_11192
Pfam-B_3341
Pfam-B_9637
Pfam-B_3424
Pfam-B_4314
Pfam-B_2723Pfam-B_364
Pfam-B_4424
Pfam-B_5703
Pfam-B_1301
Pfam-B_684ce l l u l ase
Pfam-B_8548
Pfam-B_5584
Pfam-B_8546
Pfam-B_1308
Pfam-B_1660
Pfam-B_1474
Pfam-B_6384
Pfam-B_4213
Pfam-B_2187
Pfam-B_8326
Pfam-B_295
Pfam-B_5509
Pfam-B_8305Pfam-B_5513
Pfam-B_1848
Pfam-B_8368
Pfam-B_5471Pfam-B_5516
Pfam-B_5487
Pfam-B_1863
Pfam-B_1333
Pfam-B_5482
Pfam-B_294
Pfam-B_8312
Pfam-B_3434
Pfam-B_3303
Pfam-B_5793
Pfam-B_8544
Pfam-B_296
Pfam-B_4107
Pfam-B_1644
Pfam-B_2684
Pfam-B_8547
Pfam-B_322Pfam-B_741
Pfam-B_8364Pfam-B_5517
Pfam-B_666 Pfam-B_4102
Pfam-B_1853
Pfam-B_629
Pfam-B_1860
Pfam-B_800
Pfam-B_4154
Pfam-B_3240
Pfam-B_576
Pfam-B_33
Pfam-B_8430
Pfam-B_1446
Pfam-B_4141
Pfam-B_1627
Pfam-B_1629
Pfam-B_8289
Pfam-B_8358
Pfam-B_9624
Pfam-B_9623
Pfam-B_4011
Pfam-B_8439
Pfam-B_7977
Pfam-B_2616
Pfam-B_8423
Pfam-B_4159
Pfam-B_10799
Pfam-B_10802
Pfam-B_347
Pfam-B_668
Pfam-B_2648
Pfam-B_8493
Pfam-B_1285
Pfam-B_799
Pfam-B_4212
cy toch rome_c
Pfam-B_255
Pfam-B_1165
Pfam-B_2848
Pfam-B_1477
Pfam-B_8734
Pfam-B_1663
Pfam-B_1846
Pfam-B_5350
Pfam-B_1512
Pfam-B_9942
Pfam-B_3970v w c
Pfam-B_9563
Pfam-B_3422Pfam-B_4402
Pfam-B_9544
Pfam-B_7477
Pfam-B_7200
Pfam-B_3875
Pfam-B_757
Pfam-B_4409
Pfam-B_540
Pfam-B_10601
Pfam-B_7482
Pfam-B_10015
Pfam-B_11329 Pfam-B_1064
Pfam-B_1390
Pfam-B_282
Pfam-B_6053Pfam-B_6563
Pfam-B_1927
Pfam-B_9927
Pfam-B_6038
Pfam-B_9648
Pfam-B_2998
Pfam-B_2084
Pfam-B_1391
Pfam-B_2085
Pfam-B_11046
Pfam-B_8676
Pfam-B_10145
Pfam-B_8419Pfam-B_8508
Pfam-B_4138
e f h a n d
Pfam-B_2305
Pfam-B_11279
Pfam-B_4418
Pfam-B_10918
Pfam-B_3876
Pfam-B_8210
Pfam-B_8482
Pfam-B_5958
Pfam-B_9546
Pfam-B_1531
Pfam-B_1801
Pfam-B_896Pfam-B_2527
COX1
Pfam-B_23
Pfam-B_1730
Pfam-B_4474
Pfam-B_442
Pfam-B_2408
Pfam-B_2694
Pfam-B_7481
Pfam-B_478
Pfam-B_6516
Pfam-B_11209
Pfam-B_452
Pfam-B_2012Pfam-B_2011
Pfam-B_2528
Pfam-B_859Pfam-B_339
Pfam-B_5112w a p
i n s
Pfam-B_1286
Pfam-B_455
Pfam-B_8204
Pfam-B_390
Pfam-B_8206
Pfam-B_804
Pfam-B_2763
Pfam-B_1063
Pfam-B_8205
Pfam-B_4490
Pfam-B_8862
f n 1
Pfam-B_5709
Pfam-B_9527
Pfam-B_5708
f ib r inogen_C
Pfam-B_1274
Pfam-B_6033
Pfam-B_688
Pfam-B_7263
Pfam-B_7286
Pfam-B_687
Pfam-B_453
Pfam-B_818
Pfam-B_11846
Pfam-B_5718
Pfam-B_2797
Pfam-B_4240
Pfam-B_1216
f n 3
Pfam-B_4852
Pfam-B_10826
Pfam-B_10827
Pfam-B_7274
t r y p s i n
Pfam-B_1177
Pfam-B_10822
Pfam-B_1060
Pfam-B_686
Pfam-B_4813
Pfam-B_10817
Pfam-B_2682Pfam-B_10818
Pfam-B_1693
Pfam-B_4475
Pfam-B_9562f e r 4
Pfam-B_8239
Pfam-B_380
Pfam-B_4087
Pfam-B_1275Pfam-B_2178
Pfam-B_2611Pfam-B_3002
Pfam-B_5028
Pfam-B_5441 Pfam-B_8245
Pfam-B_5444
Kuni tz_BPTI
v w a
Pfam-B_4499
Pfam-B_10824Pfam-B_2756
Pfam-B_1326
Pfam-B_10819
Figure 5.13. The Family network using only circular patterns
Red nodes are nodes found in the non-CP network. Green nodes are nodes found only in theCP-network. Blue edges denote edges found in the CP network. Pink edges are edges foundonly in the CP network.
Chapter 6
Circular Pattern Discovery
6.1 Introduction
The circular pattern discovery (CPD) problem is to identify “interesting” circular patterns intext T . Here, “interesting” is typically defined in terms of constraints in the search, for instance,based on occurrence frequency, pattern length, proximity between patterns, etc. When T is adatabase of sequences, additional constraints, may be imposed, for example the coverage orquorum constraint [86]. In biological applications, for instance, interesting circular patterns arelikely to have biological relevance, e.g., they could point to proteins with related functions, evenwith low sequence similarity.
The CPD problem is related to the more well-known circular pattern matching (CPM) prob-lem discussed in Chapter 5.
To our knowledge, there is no existing work that explicitly studied the problem of patterndiscovery involving circular patterns. Motivated by the increasing significance of circular per-mutations in various applications, from computational biology to pattern analysis, we proposemethods to address the circular pattern discovery problem.
Main Results. We define and solve the ECPD and ACPD problems to find the “interesting”circular patterns, as defined using specified constraints.
133
CHAPTER 6. CIRCULAR PATTERN DISCOVERY 134
In this chapter, we present an algorithm to solve the ECPD problem. We also present twoalgorithms to solve the ACPD problem. One algorithm is based on Maes [81] CPM algorithm.Another algorithm is based on our ACPM2 algorithm. On average, the ACPD algorithm basedon ACPM2 algorithm is better than the ACPD algorithm based on Maes [81] CPM algorithm.
The following two theorems represent our main contribution on the CPD problem.
Theorem 6.1: Given a database sequence SeqDB, with r sequences, and the parametersm1,m2,k, f ,g, The algorithm ECPD uses suffix trees and suffix links to solve the exact circularpattern discovery problem in O(m2
2N) time, where N is the total number of symbols in SeqDB.
Theorem 6.2: Given a database sequence SeqDB, with r sequences, and the parametersm1,m2,k, f ,g, Algorithm ACPD which based on ACPM2 algorithm uses suffix array to solvethe ACPD problem in O(km3
2N2) worst case, and O(km32N) on average, where N is the total
number of symbols in SeqDB.
Organization. In the next section, we define the circular pattern discovery problems. Al-gorithms for the ECPD problem are presented and analyzed in Section 3. The ACPD problemsare introduced and solved in Section 4. In Section 5, we show experiments on analyzing circularpermutations in multidomain proteins using our algorithms. In Section 6, we summarize ourwork on circular pattern matching problems.
6.2 The Circular Pattern Discovery Problem
Although a lot of progress has been made in pattern discovery, sequential data mining,and pattern matching, there has not been much attention to the issue of pattern discovery withcyclic patterns. While the pattern matching problem assumes that a pattern of interest willbe provided before matching can start, pattern discovery does not require any initial pattern.Starting with no specific query pattern, one may seek to find “interesting” circular substringswithin the sequence, or database of sequences, for example, circular substrings that occurredwith a minimal number of occurrences. We call this the circular pattern discovery (CPD)problem. We consider three variations of the CPD problem below.
CHAPTER 6. CIRCULAR PATTERN DISCOVERY 135
Exact Circular Pattern Discovery Problem (ECPD). Given a text T and a number f ,return all the high frequency cyclic substrings s (i.e. with f requency ≥ f ) and their respectivecircular shifts in T .
Approximate Circular Pattern Discovery Problem (ACPD). Given a text T and the num-bers f and k, return all the high frequency circular substrings s (i.e. with f requency ≥ f ) andtheir respective circular shifts with k-approximate matches in T .
Circular Pattern Discovery Problem (parameterized form). Given a database sequenceQ, with r sequences, and the parameters m1,m2,k, f ,g, return all circular substrings s and theirrespective circular shifts that have a k-approximate match in Q, with a high occurence frequency( f requency≥ f ), occurs in at least g sequences (where g≤ r), and the length m of each matchingsubstring satisfies the constraint m1 ≤ m≤ m2.
The parameter g models the coverage or quorum constraints, often imposed in motif dis-covery for biological sequences [86]. For instance, we may want a subsequence to appear in acertain proportion of the members of a protein family, before we can accept the subsequence asa valid motif for that family. The ECPD problem corresponds to the case with k = 0. Clearly,the parameterization could be modified to impose more or less constraints in the discovery, asdesired.
The Challenge. To see the difficulty involved in the CPD problem, we can consider thecomplexity of the naıve algorithm for the problem. First, consider pattern discovery for patternsof a specific length, say m (that is, m1 = m2 = m), on a text T of length n. For the ECPDproblem, we will have (n−m + 1) substrings, each with m cyclic shifts, with each m-lengthpattern requiring O(nm) time to search in T . The overall time will be in O(m2n2). We canimprove this to O(m2n) using standard linear time pattern matching algorithms, such as theKMP or Boyer-Moore algorithms. When we consider a range of pattern lengths (for example,m1 ≤ m ≤ m2), the overall complexity becomes O(m3
2n). When no length is specified, wewill need to consider all possible pattern lengths, hence, the overall time complexity will be inO(n4). Using a similar analysis, for the ACPD problem, we will need time in O(km2n2) forone single pattern length m, and in O(km3
2n2)) for a range of pattern lengths (m1 ≤ m ≤ m2).We have assumed the use of Ukkonen’s O(kn) algorithm for k-approximate matching of a givenm-length substring of the text. With no specified constraint on the length, we will require time
CHAPTER 6. CIRCULAR PATTERN DISCOVERY 136
in O(kn5) for the parametrized ACPD problem.
The CPD problem is related to the CPM problem, but is much more complicated. ApplyingCPM algorithms directly will be making the assumption that we know what we are trying to”discover”, or would require an exhaustive consideration of all cyclic substrings. In the follow-ing, we first propose a fast ECPD algorithm, by exploiting suffix links, which are typically partof a standard suffix tree. We then address the ACPD problem, using suffix arrays.
6.3 The ECPD Algorithm
Our ECPD algorithm is based on our algorithm for the ECPM problem as presented inSection 5.2.
First the ECPD algorithm (Algorithm 6.1) will build a suffix tree ST from the sequence T .Then the algorithm checks each m-length pattern from T for possible circular pattern matching,using the suffix tree. Each operation of checking an m-length pattern for possible CPM willneed an O(m) time cost using the ECPM algorithm. There are O((m2−m1)N) patterns whoselength m is in the range [m1,m2]. So the time complexity of the ECPD algorithm will be inO((m2−m1 +1)(m2 +m1)N) = O(m2
2N).
Algorithm 6.1: ECPD Algorithm
ECPD(T,N,m1,m2, f )1 ST← BuildSuffixTree(T )2 for m = m1 to m2 do3 for each m-length substring P of T do4 ηocc ← ECPM(ST,P)5 if ηocc ≥ f then do6 Output the circular pattern P7 end if8 end for9 end for
Based on the foregoing, we summerize our results on the ECPD problem in the followingtheorem:
CHAPTER 6. CIRCULAR PATTERN DISCOVERY 137
Theorem 6.1: Given a database sequence SeqDB, with r sequences, and the parametersm1,m2,k, f ,g, The ECPD Algorithm uses suffix trees and suffix links to solve the exact circularpattern discovery problem in O(m2
2N) time, where N is the total number of symbols in SeqDB.
6.4 The ACPD Algorithm
We first apply an existing ACPM algorithm directly on the ACPD problem. This modelsthe use of a generic ACPM algorithm for the ACPD problem. In particular, we modify Maes’sACPM algorithm to solve the ACPD problem. Subsequently, we describe our ACPD algorithm,and compare the two methods.
6.4.1 ACPD using Maes’ Algorithm
Algorithm 6.2 shows a method to solve the ACPD problem based on Maes’ algorithm. Thealgorithm compares each m-length pattern from the text with each (m+k)-length subtext, wherem1 ≤ m≤ m2. For each length m, there are min{O(N),O(|Σ|m)} patterns and subtext, so thereare O(N2) comparisons. The total number of comparisons is in O((m2−m1)N2). For eachcomparison, the time cost is in O(m2 logm). Thus, the time complexity of Algorithm 5.10 is
∑m2m=m1
O(N2m2 logm) = O(m32N2 logm2).
Landau’s Algorithm
Algorithm 6.3 is based on Landau’s incremental algorithm [67]. Similar to Maes’ algorithm,this algorithm compares each m-length pattern with each m + k-length subtext, where m is thelength of pattern and m1≤m≤m2. For each length m, there are min(O(N),O(|Σ|m) patterns andsubtext, so there are O(N2) comparisons. The total number of comparisons is O((m2−m1)N2).For each comparisons, the time cost is O(km). Thus, the time complexity of algorithm 6.3 is
∑m2m=m1
O(N2km) = O(km22N2).
CHAPTER 6. CIRCULAR PATTERN DISCOVERY 138
Algorithm 6.2: ACPD Based on Maes’ algorithm
ACPD-MAES(T,N,k,m1,m2, f )1 for m = m1 to m2 do2 for each m-length substring P of T do3 ηocc← 04 for i = 1 to N−m− k do5 if MAESALGORITHM(P,T [i...i+m+ k]) is true then do6 ηocc← ηocc +17 end if8 end for9 if ηocc ≥ f then do10 Output P11 end if12 end for13 end for
Algorithm 6.3: ACPD Based on Landau’s algorithm
ACPD-LANDAU(T,N,k,m1,m2, f )1 ST← BuildSuffixTree(T )2 for m = m1 to m2 do3 for each m-length substring Pm do4 for i = 1 to N−m− k do5 LANDAUALGORITHM(Pm ,T [i...i+m+ k])6 end for7 end for8 end for
CHAPTER 6. CIRCULAR PATTERN DISCOVERY 139
6.4.2 Proposed ACPD Algorithm
Our ACPD algorithm (Algorithm 6.4) uses the same framework as the described ACPMalgorithm (Section 5.3.4). The algorithm first constructs the hypotheses on potential circularpattern matches using q-gram filtration. For each hypothesis (i.e. each matching q-gram in T ),the ACPD algorithm verifies all possible circular shifts that involves this q-gram match.
The algorithm constructs O(N2) worst-case number of hypotheses with parameter q =b m1
k+1c, where N is the total length of the concatenated sequences (for a database of sequences).
On average, the number of hypotheses is O( N2
|Σ|q ). When |Σ|q is close to O(N), the number ofhypotheses will reduce to O(N).
At verification, the algorithm checks the circular shifts of length m1,m1 +1, ...,m2. For eachpattern with length m, there are O(m) circular shifts to check, where m1 ≤ m ≤ m2. The timefor checking all of m-length circular shifts in the same circular pattern involving the currenthypothesis is in O(km) using the verification algorithm (Algorithm 5.5). In each hypothesis, thenumber of circular pattern with length m is m. So the time cost of checking all circular shiftsof circular pattern with length m in a hypothesis is O(km2). The total time for verification willbe O(∑m2
m=m1km2) = O(km3
2). The worst case time complexity of Algorithm 6.4 will thus be in
O(km32N2). On average, the runing time will be in O(km3
2N2
|Σ|q ). When |Σ|q is close to O(N), as istypically the case, the time complexity will reduce to O(km3
2N).
A further improvement will be to exploit the huge redundancy in going from a pattern oflength m to a pattern of length (m+1). We observe that to go from an m-length pattern startingat position i in T to an (m + 1)-length pattern starting from the same position involves addingjust one symbol at the end. Most k-approximate cyclic pattern matches at one length will alsobe cyclic matches at the next length, if the cyclic edit distance is less than k− 1. Also, non-matching regions in T for the m-length pattern cannot be a match for the (m+1)-length patternif the circular edit distance with the m-length pattern is greater than k + 1. Exploiting thisredundancy by keeping the full dynamic programming table, will lead to a further reduction ofthe overall complexity to O(m2
2N2) worst case and O(m22N) on average. This is made possible
only by the nature of the CPD problem.
CHAPTER 6. CIRCULAR PATTERN DISCOVERY 140
Based on the foregoing, we summerize our results on the ACPD problem in the followingtheorem:
Theorem 6.2: Given a database sequence SeqDB, with r sequences, and the parame-ters m1,m2,k, f ,g, Algorithm ACPD solves the ACPD problem in O(km3
2N2) worst case, andO(km3
2N) on average, where N is the total number of symbols in SeqDB.
In the above CPD algorithms, we have assumed one single input sequence, T , for simplicity.For a database of sequences, we can simply concatenate the sequences in the database to formone long sequence, and then construct the generalized suffix tree (or suffix array) for the longsequence. Then, we can keep track of the number of occurrence of each circular pattern in eachsequence in the database to support quorum constraints.
6.4.3 Comparison
The time complexity of the ACPD solution via Maes’ algorithm is O(m32N2 logm2). The
time complexity of the ACPD solution via Landau’s algorithm is O(km22N2. The time complex-
ity of the proposed ACPD algorithm is O(km32N2). The number of patterns is minN, |Σ|m. The
proposed ACPD algorithm is based on q-gram filtration. The average number of hypothesesis in O( N2
|Σ|mk). When |Σ|mk is close to O(N), the time complexity of the ACPD algorithm will
reduce to O(km32N). By exploiting the nature of the CPD problem in terms of the redundancy
between m-length paterns and (m + 1)-length patterns starting at the same location in the text,the complexity for the proposed ACPD algorithm can be reduced to O(m2
2N2) worst case, andO(m2
2N) on average.
6.5 Experiments
We performed circular pattern discovery on the same protein multidomain sequences withpattern length m, where 4 ≤ m ≤ 30. Figure 6.1 shows the variation of the number of distinctpatterns (both circular and non circular pattern) with pattern lengths. When the length is 4, thenumber of distinct patterns is large. When the length increases, the number of distinct patterns
CHAPTER 6. CIRCULAR PATTERN DISCOVERY 141
Algorithm 6.4: Proposed ACPD algorithm
ACPD-QGRAM(T,N,k,m1,m2, f )1 < SA,LCP >← COMPUTESA(T )2 q← b m1
k+1c, Qset ← NULL3 for i = 1 to N do4 if LCP[i] ≥ q then do5 Qset ← c(Qset ,SA[i])6 else7 for j = 1 to length(Qset ) do8 for l = j+1 to length(Qset ) do9 Ppos ← Qset [j], Tpos ← Qset [l]10 for m = m1 to m2 do11 T1 ← T[Tpos+q+1...Tpos+m+k]; T2 ← Text[Tpos-(m-q+k)...Tpos-1]12 for h = 0 to m-q do13 P← T[Ppos+q+1...Ppos+q+h] ◦ Text[Ppos-m+q+h...Ppos-1]14 if BIDIRECTIONALED2(P,T1,T2,k) is true then do15 if P is the first occurrence then do16 ηocc(P)← 117 else18 ηocc(P)← ηocc(P)+119 end if20 end if21 end for22 end for23 end for24 end for25 Qset ← NULL26 end if27 end for28 ∀ P, Output P if ηocc(P)≥ f
CHAPTER 6. CIRCULAR PATTERN DISCOVERY 142
Algorithm 6.5: Modified ACPM Hypothesis Verification
BIDIRECTIONALED2(P,T1,T2,k)1 P2← PR
2 ED1← DP(P,T1,k)3 ED2← DP(P2,T R
2 ,k)4 ED← ED2R S
rep(0,q)S
ED15 for h=1 to |P|+16 if (ED[h] + ED[h+m-1] ≤ k) then do7 return true8 end if9 end for
decreases quickly. Figure 6.2 shows the variation of the number of the circular patterns (i.eexcluding the non-circular patterns) with pattern lengths. As expected, the number decreasesrapidly with increasing length. In Figure 6.3 we show the maximum number of occurrences ofone pattern at different pattern lengths. Table 6.1 shows the number of distinct patterns and thenumber of non-circular patterns at different pattern lengths. Table 6.1 also shows the ratio with
Number o f distinct patternsNumber o f non−circular patterns . Interestingly, some circular patterns were discovered to occur withlenghts as large as 26 symbols. That is, 26 domains were repeated, but in the form of a cyclicpermutation between two multidomain proteins in the database.
Table 6.2 shows an example of a circular pattern with five domains. The pattern is {PDA1N075,PD000041, PD000041, PD248344, PD000041}. We assign the symbol ”A” to protein domainPDA1N075, the symbol ”B” to protein domain PD000041 and the symbol ”C” to protein do-main PD248344. It occurs in multi-domain protein sequences Q99407 HUMAN, ANK1 HUMANand Q13768 HUMAN with different orders.
Table 6.3 shows an example of circular pattern with thirteen domains. The pattern is{PD000768, PD000767, PD005993, PD000768, PD000767, PDA1J766, PD005993, PD000768,PD000767, PDA1J7O1, PD000165, PD272501, PDA1J3F2}. We assign the string {FGHFGIHFGJKLM}to the pattern. It occurs in the multi-domain protein sequences Q9Y4V9 HUMAN, Q5JR23 HUMANand Q9UJ57 HUMAN with different orders.
CHAPTER 6. CIRCULAR PATTERN DISCOVERY 143
5 10 15 20 25 30
050
0010
000
1500
0
Number of Distinct Patterns
Pattern Length
Num
ber
of D
istin
ct N
orm
al P
atte
rns
Figure 6.1. Variation of the number of distinct patterns (including non-circular and circular
patterns) with pattern length.
5 10 15 20 25 30
010
020
030
040
0
Number of True CPs
Pattern Length
Num
ber
of T
rue
CP
s
Figure 6.2. Variation of number of circular patterns with pattern length.
CHAPTER 6. CIRCULAR PATTERN DISCOVERY 144
Table 6.1. The number of distinct patterns with pattern lengthPattern # Distinct # Non-circular # Circular RatioLength Patterns Patterns patterns
4 80333 79934 399 100.50%5 8199 7944 255 103.21%6 4720 4705 15 100.32%7 2754 2677 77 102.88%8 836 830 6 100.72%9 550 532 18 103.38%
10 855 855 0 100.00%11 937 921 16 101.74%12 1011 983 28 102.85%13 727 676 51 107.54%14 299 296 3 101.01%15 265 262 3 101.15%16 141 141 0 100.00%17 139 139 0 100.00%18 116 100 16 116.00%19 50 50 0 100.00%20 58 57 1 101.75%21 51 51 0 100.00%22 19 19 0 100.00%23 37 37 0 100.00%24 18 18 0 100.00%25 34 34 0 100.00%26 18 17 1 105.88%27 19 19 0 100.00%28 2 2 0 100.00%29 8 8 0 100.00%30 5 5 0 100.00%
Table 6.2. Sample discovered circular patterns with length five.A≡PDA1N075, B≡PD000041, C≡PD000041, D≡PD248344, E≡PD000041
Protein Pos order PatternQ99407 HUMAN 6 0 ABBCBQ13768 HUMAN 7 0 ABBCB
ANK1 HUMAN 6 3 CBABBQ13768 HUMAN 6 4 BABBCQ99407 HUMAN 5 4 BABBC
CHAPTER 6. CIRCULAR PATTERN DISCOVERY 145
5 10 15 20 25 30
020
4060
8010
012
0
Maximum Number of Occurrences
Pattern Length
Max
imum
Num
ber
of O
ccur
renc
t
Figure 6.3. Variation of maximum number of occurrences with pattern length.
Table 6.3. Sample discovered circular patterns with length thirteen.F≡PD000768, G≡PD000767, H≡PD005993, I≡PDA1J766,J≡ PDA1J7O1, K≡PD000165, L≡PD272501, M≡PDA1J3F2
Protein Pos order PatternQ9Y4V9 HUMAN 20 0 FGHFGIHFGJKLMQ5JR23 HUMAN 21 1 GHFGIHFGJKLMFQ9UJ57 HUMAN 38 2 HFGIHFGJKLMFG
CHAPTER 6. CIRCULAR PATTERN DISCOVERY 146
6.6 Summary
In this chapter, we introduced the ECPD and ACPD problems, and proposed two algorithmsfor their solution. The first algorithm uses suffix trees and suffix links to solve the ECPDproblem in O(m2
2N) time. The second algorithm solves the more challenging ACPD problemin O(km2
2N2) worst case, and O(km22N) on average, using suffix arrays. By exploiting the
redundancy between patterns of different lengths that start at the same position in the text,the overall complexity is reduced to O(km2N2) worst case, and O(km2N) on average. Ourresults can be compared with the approach that directly uses Maes’ algorithm, one of the bestavailable ACPM algorithms for the ACPD problem, which runs in O(m3
2N2 logm2) time, orO(m3
2N logm2) for small values of m2. Although pattern discovery has been well studied, to ourknowledge, this is the first attempt at a focused study on the CPD problem.
We show an experiment for discovering the exact circular patterns in ProDom database. Wereported interesting exact circular patterns discovered by our algorithm. To our knowledge, thisis the first experiment at a focused study on discovering the exact circular patterns.
Chapter 7
Conclusion and Future Work
7.1 Conclusion
In this work, we present two novel data structures to solve the space problem of the suffixtree and its applications. These two structures are the virtual suffix tree (VST) and the Proba-bilistic Suffix Array (PSA). We consider the circular pattern matching problem and define thecircular pattern discovery problem. We present algorithms using suffix data structures to solveboth the exact and inexact variants of the CPM problem. Our focus is on efficiently answeringenumerative queries involving circular patterns. We also present algorithms based on our CPMalgorithms to solve the CPD problem. To our knowledge, this is the first attempt at a focusedstudy on discovering circular patterns.
The Virtual Suffix Tree. We introduce the VST (virtual suffix tree), an efficient datastructure for suffix trees and suffix arrays. The VST provides the same functionality as thesuffix tree, including support for pattern matching, and suffix links. But the VST requires amuch smaller space than the suffix tree and the other recently proposed space-efficient datastructures for suffix trees and suffix arrays. With n=length of the sequence, the worst case spaceis 18n bytes compared with 20n bytes for the other data structures for suffix trees and suffixarrays (such as the enhanced suffix array [2], and the linearized suffix tree [62]). On average,the space requirement (including that for suffix arrays and suffix links) is 13.8n bytes for the
147
CHAPTER 7. CONCLUSION AND FUTURE WORK 148
regular VST, and 12.05n bytes in its compact form.
The Probabilistic Suffix Array. We have presented the probabilistic suffix array (PSA),a data structure for representing information in variable length Markov models. The PSA pro-vides the same functionality as the probabilistic suffix tree (PST), but the space is significantlysmaller than that of the PST. We construct the PSA in linear time and linear space, independentof the order of the Markov model. The space is dependent on the number of interval nodesof the suffix array. In the worst case, the memory requirement is 33N bytes. On average, theneeded space will be 26N bytes, which includes the work space of the construction phase. Theneeded space for PSA is significantly smaller than that for the PST which is implemented ona regular suffix tree. Prediction using the PSA is in O(m log n
|Σ|)time, where m is the patternlength, and Σ is the symbol alphabet.
Circular Pattern Matching. We defined the circular pattern matching(CPM) problems inthe related work. We present a linear time algorithm to solve the ECPM problem which aims atfinding an exact circular pattern P in the text T . The ACPM problem is to find an approximateoccurrence of circular pattern P in text T . We present three algorithms for the ACPM2 problemand one greedy algorithm that produces an incomplete result. Of all the algorithms reportedin the literature for the ACPM problem, our ACPM q-gram-based bidirectional algorithm withsuffix trees provides the best results on average, with respect to both time and space complexity.Using our algorithms, we performed experiments on the analysis of circular permutations inmultidomain proteins. Based on the results, we developed a method for function prediction forsuch multidomain proteins.
Circular Pattern Discovery. Based on the work on the circular pattern matching problem,we defined the Exact Circular Pattern Discovery (ECPD) problem and the Approximate CircularPattern Discovery (ACPD) problem. We present an ECPD algorithm that uses suffix trees andsuffix links to solve the exact circular pattern discovery problem in O(m2
2N) time. We alsopresent an efficient algorithm to solve the ACPD problem based on our bidirectional ACPM2algorithm. Comparing with the ACPD algorithm based on Maes [81] CPM algorithm, ouralgorithm is the better solution on average.
CHAPTER 7. CONCLUSION AND FUTURE WORK 149
7.2 Future Work
We briefly describe some potential future work, based on the material presented in thiswork.
7.2.1 Circular Pattern Discovery
We have presented algorithms for the circular pattern discovery problem. Our presentedalgorithms are based on our CPM algorithm. In our ACPD algorithm, there are lot of repeatingoperations that compares the circular shifts vs. the subtext. In our future thinking, we may finda way to avoid the repeated comparisons.
7.2.2 Network Analysis for Circular Multidomain Proteins
In this work we proposed efficient algorithms for rapid detection of cyclic permutations inmultidomain proteins, using suffix trees and suffix arrays. Based on the results, we formed net-works linking different multidomain proteins based on cyclic patterns observed in the proteins,using current data in the ProDom database [21]. Using these networks we performed functionalannotation of the multidomain proteins. Significance of functional relatedness between twomultidomain proteins was accessed using z-scores and p-scores. We performed only simpleanalysis of the networks, for instance, based on simple in-degree and out-degree characteristicsof the nodes in the network. The overall performance using the method on a network formedwith the Top 500 proteins (as ranked by degree) was as follows: sensitivity 0.81; precision0.88, F-measure 0.84. Although based only on sequence data, and with no alignment, theseresults are comparable with ProFunc [69, 70] and ProKnow [101], both of which perform ataround 70% accuracy. These are web-based servers that predict protein function from its 3Dstructure, using initial structure alignment, and a combination of algorithms. Motivated by theseimpressive results, an interesting future direction will be a more detailed study of these networksof cyclic patterns, built on just protein sequence data. Network characterization using more rig-orous methods such as betweeness [43], centrality [52], and pairwise discontinuity [102] couldshow more light on the potential relationships between related multidomain proteins, or non-
CHAPTER 7. CONCLUSION AND FUTURE WORK 150
related multidomain proteins with potentially similar function.
7.2.3 From PSA to PFA
The probabilistic suffix automata is a subclass of the probabilistic finite automata (PFA)which is a simple model for learning in Markov models with order L. Ron et al. [109] provedthat “every distribution generated by a probabilistic suffix automata can equivalently be gener-ated by a PST”. In our future work, one can consider how to construct a probabilistic suffixautomata from our probabilistic suffix array data structure. Previous work [9, 109] has con-structed the probabilistic suffix automata from the probabilistic suffix tree (PST). But the spaceand time complexity is O(Ln2) which is huge. The compressed suffix array [45] and the com-pact suffix array [82, 83] are space-efficient suffix data structures. It would be interesting tostudy how these space-efficient suffix data structures can be used to improve the PSA.
7.2.4 Approximate Pattern Matching Using PSA
Dynamic programing [114] is a well know method to find approximate patterns, but thetime complexity is O(m2n) to find all the occurrences, where m is the length of pattern and n isthe length of the text. Ukkonen’s algorithm [121] improved the time complexity of this problemto O(mnk), where k is maximum error allowed in the pattern. This is still is large in practice. Inour thinking, the probability generated from PSA prediction may give us a useful measurementfor the approximate pattern matching problem. This simple thinking will work for existentialqueries in O(n+m log |Σ|) time, where O(n) is the PSA construction cost and O(m log |Σ|) is theprediction cost. But for enumerative queries, finding all the possible matching positions usingthe PSA will be a major challenge.
7.2.5 Prediction with PSA using Inexact Matching
Markov models in information retrieval are often based on item frequency and documentfrequency. However, in the real world, there are often some mismatch/errors (i.e. insert, delete,substitute) in the sequence. If we could compute the Markov Model based on an approximation
CHAPTER 7. CONCLUSION AND FUTURE WORK 151
of the item frequency and document frequency, we will get a more robust model in prediction.Here, approximate frequencies are computed by considering possible mutations in the sequence.
Calculating Approximate Probability
The transition matrix of the Markov model will be calculated based on an approximate probabil-ity. The approximate probability is calculated by the approximate item frequency and documentfrequency. We use the following equation in place of the regular equation which calculates thetransition matrix using the exact term frequency (TF) or document frequency (DF).
P(Xn+1 = j|Xn = i,Xn−1 = in−1...,X0 = i0) = argmaxED(Y,X)≤k
P(Ym+n = j|YnYm+n−1Ym+n−2...Ym+n−L)
where ED(Y,X)=ED(Y[m+n-L+1 . . . m+n],X[n-L+1 . . . n])
Difficulty
For this extension of the PSA, we need to cope with two hard problems.
1. How to calculate the approximate term frequency (TF) and document frequency (DF)in efficient time and space complexity. This is similar to the motif discovery problem.We have to find a better algorithm to calculate the approximate term frequency (TF) anddocument frequency (DF).
2. How to associate nodes in the PSA to the approximate probability. In our current PSA,symbols in the same edge share the same node. Hence they have the same probability, tfand df. Thus the space for the PSA is linear with respect to the length of sequence. In thisfuture work, symbols in the same edge may not share the same probability. Thus in thenaive implementation, the space for the PSA will be quadratic with respect to the lengthof the sequence. Reducing this space requirement is an interesting challenge.
CHAPTER 7. CONCLUSION AND FUTURE WORK 152
If these two problems can be solved, a robust representation for Markov models will beachieved.
7.3 Publications from the Dissertation
1. Jie Lin, Yue Jiang, and Don Adjeroh. The Virtual Suffix tree: An efficient data structurefor suffix trees and suffix arrays, the Prague Stringology Conference (PSC), 2008.
2. Jie Lin, Yue Jiang, Donald A. Adjeroh. The Virtual Suffix Tree, International Journal ofFoundations of Computer Science, Vol:20, Issue No:6, pp1109-1133, 2009.
3. Jie Lin and Don Adjeroh. All-against-all circular pattern matching. 2011. Under review.
4. Jie Lin and Don Adjeroh. Circular Pattern Discovery. 21st International Workshop onCombinatorial Algorithms, 2011.
5. Jie Lin, Don Adjeroh and Bing-Hua Jiang. Probabilistic Suffix Array: Efficient Modellingand Prediction of Protein Families. 2011. Under review.
6. Jie Lin, Don Adjeroh and Bing-Hua Jiang. Algorithms for Efficient Detection of CPs inMultidomain Protein. 2011. To be submitted.
Bibliography
[1] N Abe and M Warmuth. On the computational complexity of approximating distributionsby probabilistic automata. Machine Learning, 9:205–260, 1992.
[2] M. I. Abouelhoda, S. Kurtz, and E. Ohlebusch. Replacing suffix trees with enhancedsuffix arrays. Journal of Discrete Algorithms, 2:53 – 86, 2004.
[3] D. Adjeroh, T. Bell, and A. Mukherjee. The Burrows-Wheeler Transform: Data Com-pression, Suffix Arrays and Pattern Matching. Springer-Verlag, 2008.
[4] Don Adjeroh and Fei Nan. Suffix sorting via Shannon-Fano-Elias codes. Algorithms,3(2):145–167, 2010.
[5] Alfred V. Aho and Margaret J. Corasick. Efficient string matching: An aid to biblio-graphic search. Commun. ACM, 18:333–340, June 1975.
[6] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basic local alignmentsearch tool. Journal of Molecular Biology, 215(3):403–410, October 1990.
[7] Amihood Amir, Yonatan Aumann, Gad M. Landau, Moshe Lewenstein, and Noa Lewen-stein. Pattern matching with swaps. In FOCS, pages 144–153, 1997.
[8] Arne Andersson and Stefan Nilsson. Efficient implementation of suffix trees. 25(2):129–141, 1995.
[9] Alberto Apostolico and Gill Bejerano. Optimal amnesic probabilistic automata or how tolearn and classify proteins in linear time and space. Journal of Computational Biology,7(3-4):381–393, 2000.
153
BIBLIOGRAPHY 154
[10] Alberto Apostolico and Maxime Crochemore. Optimal canonization of all substrings ofa string. Inf. Comput., 95(1):76–95, 1991.
[11] Alberto Apostolico and Raffaele Giancarlo. The Boyer Moore Galil string searchingstrategies revisited. SIAM J. Comput., 15(1):98–105, 1986.
[12] H. Arimura, H. Asaka, H. Sakamoto, and S. Arikawa. Efficient discovery of proximitypatterns with suffix arrays. In A. Amir and G.M. Landau, editors, CPM, volume 2089 ofLecture Notes in Computer Science, pages 152–156. Springer, 2001.
[13] R.A. Baeza-Yates and G.H. Gonnet. A new approach to text searching. In N.J. Belkinand C.J. van Rijsbergen, editors, SIGIR 89, Proceedings of the 12th Annual InternationalACM SIGIR Conference on Research and Development in Information Retrieval, vol-ume 23, pages 168–75. ACM, New York (published as a special issue of SIGIR Forum,Vol. 23, 1-2, Fall 88/Winter 89), 1989.
[14] Ricardo A. Baeza-Yates and Chris H. Perleberg. Fast and practical approximate stringmatching. Information Processing Letters, 59(1):21–27, 1996.
[15] Alex Bateman, Lachlan Coin, Richard Durbin, Robert D. Finn, Volker Hollich1, SamGriffiths-Jones, Ajay Khanna, Mhairi Marshall, Simon Moxon, Erik L. L. Sonnhammer1,David J. Studholme, Corin Yeats, and Sean R. Eddy. The Pfam protein families database.Nucleic Acids Res, 32 (Database issue):D138–D141, 2004.
[16] Ron Begleiter, Ran El-Yaniv, and Golan Yona. On prediction using variable orderMarkov models. J. Artif. Intell. Res. (JAIR), 22:385–421, 2004.
[17] Gill Bejerano and Golan Yona. Variations on probabilistic suffix trees: Statistical mod-eling and prediction of protein families. Bioinformatics, 17(1):23–43, 2001.
[18] Michael A. Bender and Martin Farach-Colton. The LCA problem revisited. In Gaston H.Gonnet, Daniel Panario, and Alfredo Viola, editors, LATIN, volume 1776 of LectureNotes in Computer Science, pages 88–94. Springer, 2000.
[19] Kellogg S. Booth. Lexicographically least circular substrings. Inf. Process. Lett.,10(4/5):240–242, 1980.
BIBLIOGRAPHY 155
[20] R.S. Boyer and J.S. Moore. A fast string searching algorithm. Communications of theACM, 20(10):62–72, 1977.
[21] Catherine Bru, Emmanuel Courcelle, Sbastien Carrre, Yoann Beausse, Rine Dalmar, andDaniel Kahn. The ProDom database of protein domain families: More emphasis on 3D.Nucleic Acids Res, 33:212–215, 2005.
[22] H. Bunke and U. Buhler. Applications of approximate string matching to 2D shaperecognition. Pattern Recognition, 26(12):1797–1812, December 1993.
[23] M. Burrows and D. J. Wheeler. A block-sorting lossless data compression algorithm.Technical Report 124, Digital Equipment Corporation, Palo Alto, California, May 1994.
[24] Y. Cao, A. Janke, P.J. Waddell, M. Westerman, O. Takenaka, S. Murata, N. Okada,S. Paabo, and M. Hasegawa. Conflict among individual mitochondrial proteins in re-solving the phylogeny of eutherian orders. J Mol Evol., 47:307–322, 1998.
[25] H. Carrillo and D. Lipman. The multiple sequence alignment problem in biology. SIAMJournal on Applied Mathematics, 48(5):1073–1082, 1988.
[26] William I. Chang and Jordan Lampe. Theoretical and empirical comparisons of ap-proximate string matching algorithms. In CPM ’92: Proceedings of the Third AnnualSymposium on Combinatorial Pattern Matching, pages 175–184, London, UK, 1992.Springer-Verlag.
[27] William I. Chang and Eugene L. Lawler. Sublinear approximate string matching andbiological applications. Algorithmica, 12(4/5):327–344, 1994.
[28] John G. Cleary and W. J. Teahan. Unbounded length contexts for PPM. ComputerJournal, 40(2/3):67–75, 1997.
[29] Richard Cole and Ramesh Hariharan. Faster suffix tree construction with missing suffixlinks. In STOC, pages 407–415, 2000.
[30] S Cong, J Han, and D A Padua. Parallel mining of closed sequential patterns. In KDD.ACM, 2005.
BIBLIOGRAPHY 156
[31] T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. MIT Press,1990.
[32] Florence Corpet, Florence Servant, Jerome Gouzy, and Daniel Kahn. ProDom andProDom-CG: Tools for protein domain analysis and whole genome comparisons. Nu-cleic Acids Res, 28(1):267–269, 2000.
[33] Paul D. Cotter, Colin Hill1, and R. Paul Ross. Bacterial lantibiotics: Strategies to improvetherapeutic potential. Current Protein and Peptide Science, 6:61–75, 2005.
[34] Maxime Crochemore and Thierry Lecroq. Tight bounds on the complexity of theApostolico-Giancarlo algorithm. Information Processing Letters, 63:195–203, 1997.
[35] Craik DJ. Circling the enemy: Cyclic proteins in plant defence. Trends Plant Sci.,14(6):328–335, 2009.
[36] Jean-Pierre Duval. Factorizing words over an ordered alphabet. J. Algorithms, 4(4):363–381, 1983.
[37] Yariv Ephraim, Amir Dembo, and Lawrence R. Rabiner. A minimum discriminationinformation approach for hidden Markov modeling. IEEE Transactions on InformationTheory, 35(5):1001–1013, 1989.
[38] Martin Farach-Colton, Paolo Ferragina, and S. Muthukrishnan. On the sorting-complexity of suffix tree construction. J. ACM, 47(6):987–1011, 2000.
[39] Robert D. Finn, John Tate, Jaina Mistry, Penny C. Coggill, Stephen John Sammut, Hansrudolf Hotz, Goran Ceric, Kristoffer Forslund, Sean R. Eddy, Erik L. L. Sonnhammer,and Alex Bateman. The Pfam protein families database. Nucleic Acids Res, 36:281–288,2008.
[40] S. Garcia-Vallve, A. Rojas, J. Palau, and A. Romeu. Circular permutants in beta-glucosidases (family 3) within a predicted double-domain topology that includes a(beta/alpha)8-barrel.. Proteins, 31:214–223, 1998.
[41] Robert Giegerich, Stefan Kurtz, and Jens Stoye. Efficient implementation of lazy suffixtrees. Software — Practice and Experience, 33(11), 2003.
BIBLIOGRAPHY 157
[42] David Gillman and Michael Sipser. Inference and minimization of hidden Markovchains. In COLT, pages 147–158, 1994.
[43] M. Girvan and M. E. J. Newman. Community structure in social and biological networks.PNAS, 99(12):7821–7826, June 2002.
[44] Jens Gregor and Michael G. Thomason. Dynamic programming alignment of sequencesrepresenting cyclic patterns. IEEE Trans. Pattern Anal. Mach. Intell., 15(2):129–135,1993.
[45] Roberto Grossi and Jeffrey Scott Vitter. Compressed suffix arrays and suffix treeswith applications to text indexing and string matching. SIAM Journal on Computing,35(2):378–407, 2005.
[46] D. Gusfield. Algorithms on Strings, Trees and Sequences: Computer Science and Com-putational Biology. Cambridge University Press, Cambridge, UK, 1997.
[47] U. Heinemann and M. Hahn. Circular permutation of polypeptide chains: Implicationsfor protein folding and stability. Prog. Biophys. Mol. Biol., 64:121–143, 1995.
[48] R.N. Horspool. Practical fast searching in strings. 10(6):501–6, 1980.
[49] M Hu, J Yang, and W Su. Permu-pattern: Discovery of mutable permutation patterns. InKDD. ACM, 2008.
[50] James W. Hunt and Thomas G. Szymanski. A fast algorithm for computing longestcommon subsequences. Commun. ACM, 20(5):350–353, 1977.
[51] Costas S. Iliopoulos and M. Sohel Rahman. Indexing circular patterns. In WALCOM,pages 46–57, 2008.
[52] H. Jeong, S. P. Mason, A. L. Barabasi, and Z. N. Oltvai. Lethality and centrality inprotein networks. Nature, 411(6833):41–42, May 2001.
[53] Petteri Jokinen and Esko Ukkonen. Two algorithms for approximate string matchingin static texts (extended abstract). In A. Tarlecki, editor, Mathematical Foundations ofComputer Science 1991: Proc. of the 16th International Symposium, pages 240–248.Springer, Berlin, Heidelberg, 1991.
BIBLIOGRAPHY 158
[54] Maizel JV Jr and Lenk RP. Enhanced graphic matrix analysis of nucleic acid and proteinsequences. Proc Natl Acad Sci U S A, 78(12):7665–7669, 1981.
[55] Jongsun Jung and Byungkook Lee. Circularly permuted proteins in the protein structuredatabase. Protein Sci., 10(9):1881–1886, 2001.
[56] Juha Karkkainen. Suffix cactus: A cross between suffix tree and suffix array. In CPM:6th Symposium on Combinatorial Pattern Matching, 1995.
[57] Juha Karkkainen, Peter Sanders, and Stefan Burkhardt. Linear work suffix array con-struction. J. ACM, 53(6):918–936, 2006.
[58] S. Karlin, G. Ghandour, F. Ost, S. Tavare, and L.J. Korn. New approaches for com-puter analysis of nucleic acid sequences. Proceedings, National Academy of Sciences,80(18):5660–5664, 1983.
[59] R.M. Karp and M.O. Rabin. Efficient randomized pattern-matching algorithms.31(2):249–260, 1987.
[60] T. Kasai, G. Lee, H. Arimura, S. Arikawa, and K. Park. An efficient index data structurewith the capabilities of suffix trees and suffix arrays for alphabets of non-negligible size.In 12th Annual Symposium on Combinatorial Pattern Matching, 2001.
[61] Dong Kyue Kim, Jeong Eun Jeon, and Heejin Park. An efficient index data structure withthe capabilities of suffix trees and suffix arrays for alphabets of non-negligible size. InSPIRE 2004, 2004.
[62] Dong Kyue Kim, Minhwan Kim, and Heejin Park. Linearized suffix tree: an efficientindex data structure with the capabilities of suffix trees and suffix arrays. Algorithmica,2007.
[63] Dong Kyue Kim, Jeong Seop Sim, Heejin Park, and Kunsoo Park. Constructing suffixarrays in linear time. J. Discrete Algorithms, 3(2-4):126–142, 2005.
[64] D.E. Knuth, J.H. Morris, and V.R. Pratt. Fast pattern matching in strings. 6(2):323–350,1977.
BIBLIOGRAPHY 159
[65] Pang Ko and Srinivas Aluru. Space efficient linear time construction of suffix arrays. J.Discrete Algorithms, 3(2-4):143–156, 2005.
[66] R. Kohli and C. Walsh. Enzymology of acyl chain macrocyclization in natural productbiosynthesis. Chemical Communications, 3:297–307, 2003.
[67] Gad M. Landau, Eugene W. Myers, and Jeanette P. Schmidt. Incremental string compar-ison. SIAM Journal on Computing, 27:557–582, 1998.
[68] G.M. Landau and U. Vishkin. Fast string matching with k differences. Journal of Com-puter and System Sciences, 37(1):63–78, 1988.
[69] Roman A. Laskowski, James D. Watson, and Janet M. Thornton. Profunc: A server forpredicting protein function from 3D structure. Nucleic Acids Research, 33(Web-Server-Issue):89–93, 2005.
[70] Roman A. Laskowski, James D. Watson, and Janet M. Thornton. Protein function pre-diction using local 3D templates. Journal of Molecular Biology, 351:614–626, 2005.
[71] Florencia G. Leonardi. A generalization of the PST algorithm: Modeling the sparsenature of protein sequences. Bioinformatics, 22(11):1302–1307, 2006.
[72] Vladimir Levenshtein. Binary codes capable of correcting deletions, insertions, and re-versals. Cybernetics and Control Theory, 10(8):707–710, 1966. Original in DokladyAkademii Nauk SSSR 163(4): 845–848 (1965).
[73] M. Li, J.H. Badger, X. Chen, S. Kwong, P. Kearney, and H. Zhang. An information-based sequence distance and its application to whole mitochondrial genome phylogeny.Bioinformatics, 17:149–154, 2001.
[74] Jie Lin and Don Adjeroh. All-against-all circular pattern matching. 2011. Under review.
[75] Jie Lin and Don Adjeroh. Circular pattern discovery. 21st International Workshop onCombinatorial Algorithms, 2011.
[76] Jie Lin, Don Adjeroh, and Binghua Jiang. Algorithms for efficient detection of cps inmultidomain protein. 2011. To be submitted.
BIBLIOGRAPHY 160
[77] Jie Lin, Don Adjeroh, and Binghua Jiang. Probabilistic suffix array: Efficient modellingand prediction of protein families. 2011. Under review.
[78] Jie Lin, Yue Jiang, and Don Adjeroh. The virtual suffix tree: An efficient data structurefor suffix trees and suffix arrays. In Jan Holub and Jan Zdarek, editors, Proceedings ofthe Prague Stringology Conference 2008, pages 68–83, Czech Technical University inPrague, Czech Republic, 2008.
[79] Jie Lin, Yue Jiang, and Don Adjeroh. The virtual suffix tree. Int. J. Found. Comput. Sci.,20(6):1109–1133, 2009.
[80] Mortitz G. Maaβ. Computing suffix links for suffix trees and arrays. Information Pro-cessing Letters, 101(6), 2007.
[81] Maurice Maes. On a cyclic string-to-string correction problem. Inf. Process. Lett.,35(2):73–78, 1990.
[82] Veli Makinen. Compact suffix array – a space-efficient full-text index. Fundam. Inform.,56(1-2):191–210, 2003.
[83] Veli Makinen and Gonzalo Navarro. Compressed compact suffix arrays. CombinatorialPattern Matching, pages 420–433, 2004.
[84] U. Manber and G. Myers. Suffix arrays: A new method for on-line string searches. SIAMJ. Computing, 22(5):935–948, 1993.
[85] Tobias Marschall and Sven Rahmann. Probabilistic arithmetic automata and their appli-cation to pattern matching statistics. Combinatorial Pattern Matching, pages 95–106,2008.
[86] Tobias Marschall and Sven Rahmann. Efficient exact motif discovery. Bioinformatics,25(12), 2009.
[87] Andres Marzal and Sergio Barrachina. Speeding up the computation of the edit distancefor cyclic strings. Pattern Recognition, Int’l Conference on, 2:2891, 2000.
BIBLIOGRAPHY 161
[88] Geoffrey Mazeroff, Jens Gregor, Michael G. Thomason, and Richard Ford. Probabilisticsuffix models for API sequence analysis of Windows XP applications. Pattern Recogni-tion, 41(1):90–101, 2008.
[89] Edward M. McCreight. A space-economical suffix tree construction algorithm. J. ACM,23:262–272, April 1976.
[90] R. A. Mollineda, E. Vidal, and F. Casacuberta. Cyclic sequence alignments: Approxi-mate versus optimal techniques. International Journal of Pattern Recognition and Arti-ficial Intelligence, 16:291–299, 2002.
[91] R. A. Mollineda, E. Vidal, and F. Casacuberta. A windowed weighted approach forapproximate cyclic string matching. In ICPR ’02: Proceedings of the 16 th InternationalConference on Pattern Recognition (ICPR’02) Volume 4, page 40188, Washington, DC,USA, 2002. IEEE Computer Society.
[92] Ramon Alberto Mollineda, Enrique Vidal, and Francisco Casacuberta. Efficient tech-niques for a very accurate measurement of dissimilarities between cyclic patterns. InProceedings of the Joint IAPR International Workshops on Advances in Pattern Recog-nition, pages 337–346, London, UK, 2000. Springer-Verlag.
[93] Krisztian Monostori, Arkady Zaslavsky, and Heinz Schmidt. Suffix vector: Space- andtime-efficient alternative to suffix trees. In Michael J. Oudshoorn, editor, Twenty-FifthAustralasian Computer Science Conference (ACSC2002), Melbourne, Australia, 2002.ACS.
[94] J. Ian Munro, Venkatesh Raman, and S. Srinivasa Rao. Space efficient suffix trees. J.Algorithms, 39(2):205–222, 2001.
[95] G. Navarro and J. Tarhio. Boyer-Moore string matching over Ziv-Lempel compressedtext. Proceedings, Combinatorial Pattern Matching, LNCS 1848, pages 166–180, 2000.
[96] Saul Ben Needleman and Christian Dennis Wunsch. A general method applicable to thesearch for similarities in the amino acid sequence of two proteins. Journal of MolecularBiology, 48(3):443–453, 1970.
BIBLIOGRAPHY 162
[97] Ge Nong, Sen Zhang, and Wai Hong Chan. Linear time suffix array construction usingd-critical substrings. In CPM, pages 54–67, 2009.
[98] Jose Oncina. The Cocke-Younger-Kasami algorithm for cyclic strings. In ICPR ’96:Proceedings of the 13th International Conference on Pattern Recognition, page 413,Washington, DC, USA, 1996. IEEE Computer Society.
[99] Hasan H. Otu and Khalid Sayood. A new sequence distance measure for phylogenetictree construction. Bioinformatics, 19(16):2122–2130, November 2003.
[100] Philipp Pagel, Matthias Oesterheld, Volker Stumpflen, and Dmitrij Frishman. The DIMAweb resource - exploring the protein domain network. Bioinformatics, 22(8):997–998,2006.
[101] Debnath Pal. Inference of protein function from protein structure. Structure, 13:121–130,2005.
[102] Anatolij Potapov, Bjorn Goemann, and Edgar Wingender. The pairwise disconnectivityindex as a new metric for the topological analysis of regulatory networks. BMC Bioin-formatics, 9, 2008.
[103] Elise Prieur and Thierry Lecroq. From suffix trees to suffix vectors. In Prague Stringol-ogy Conference(PCS2005), Prague, 2005.
[104] Simon J. Puglisi, William F. Smyth, and Andrew Turpin. A taxonomy of suffix arrayconstruction algorithms. ACM Computing Surveys, 39(2), 2007.
[105] A. Reyes, C. Gissi, G. Pesole, F.M. Catzeflis, and C. Saccone. An information-basedsequence distance and its application to whole mitochondrial genome phylogeny. Mol.Biol. Evol., 17:979–983, 2000.
[106] J. Rissanen. Complexity of strings in the class of Markov sources. IEEE Transactionson Information Theory, IT-32(4):526–532, 1986.
[107] Jorma Rissanen. Universal coding information prediction and estimation. IEEE Trans-actions on Information Theory, IT-30(4):629–636, 1984.
BIBLIOGRAPHY 163
[108] Dana Ron, Yoram Singer, and Naftali Tishby. Learning probabilistic automata with vari-able memory length. In COLT, pages 35–46, 1994.
[109] Dana Ron, Yoram Singer, and Naftali Tishby. The power of amnesia: Learning prob-abilistic automata with variable memory length. Machine Learning, 25(2-3):117–149,1996.
[110] Luıs M. S. Russo, Gonzalo Navarro, and Arlindo L. Oliveira. Fully-compressed suffixtrees. In LATIN’08: Proceedings of the 8th Latin American Conference on TheoreticalInformatics, pages 362–373, Berlin, Heidelberg, 2008. Springer-Verlag.
[111] Kunihiko Sadakane. Compressed suffix trees with full functionality. Theory of Comput-ing Systems, 41(4):589–607, 2007.
[112] GK Sandve and F Drabls. A survey of motif discovery methods in an integrated frame-work. Biology Direct, 1(11), 2006.
[113] Thomas B. Sebastian, Philip N. Klein, and Benjamin B. Kimia. On aligning curves.IEEE Transactions on Pattern Analysis and Machine Intelligence, 25:116–125, 2003.
[114] P.H. Seller. The theory and computation of evolutionary distances: Pattern Recognition.jalg, 1:359–373, 1980.
[115] Y. Shiloach. Fast canonization of circular strings. J. Algorithms, 2(2):107–121, 1981.
[116] T. F. Smith and M. S. Waterman. Identification of common molecular subsequences. JMol Biol, 147(1):195–197, March 1981.
[117] W.F. Smyth. Computing Patterns in Strings. Addison-Wesley, 2003.
[118] Kenneth Sorensen. Distance measures based on the edit distance for permutation-typerepresentations. Journal of Heuristics, 13:35–47, 2007.
[119] YQ. Tang, J. Yuan, G. Osapay, K. Osapay, D. Tran, CJ. Miller, AJ. Ouellette, and ME.Selsted. A cyclic antimicrobial peptide produced in primate leukocytes by the ligation oftwo truncated alpha-defensins. Science, 286(5439):498–502, 1999.
BIBLIOGRAPHY 164
[120] Steven L. Tanimoto. A method for detecting structure in polygons. Pattern Recognition,13(6):389–394, 1981.
[121] Esko Ukkonen. Finding approximate patterns in strings. J. Algorithms, 6(1):132–137,1985.
[122] Esko Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249–260, 1995.
[123] Shai Uliel, Amit Fliess, Amihood Amir, and Ron Unger. A simple algorithm for detectingcircular permutations in proteins. Bioinformatics, 15(11):930–936, 1999.
[124] Shai Uliel, Amit Fliess, and Ron Unger. Naturally occurring circular permutations inproteins. Protein Eng., 14(8):533–542, August 2001.
[125] L. Wang and T. Jiang. On the complexity of multiple sequence alignment. Journal ofcomputational biology, 1(4):337–348, 1994.
[126] J. Weiner and E. Bornberg-Bauer. Evolution of circular permutations in multidomainproteins. Mol. Biol. Evol, 23(4):734–743, 2006.
[127] J. Weiner, G. Thomas, and E. Bornberg-Bauer. Rapid motif-based prediction of circularpermutations in multi-domain proteins. Bioinformatics, 21(7):932–937, 2005.
[128] P. Weiner. Linear pattern matching algorithm. Proceedings, 14th IEEE Symposium onSwitching and Automata Theory, 21:1–11, 1973.
[129] Frans M. J. Willems, Yuri M. Shtarkov, and Tjalling J. Tjalkens. The context-tree weight-ing method: Basic properties. IEEE Transactions on Information Theory, 41(3):653–664,1995.
[130] S. Wu and U. Manber. Agrep — a fast approximate pattern matching tool. In Proceedingsof the Winter 1992 USENIX Conference, pages 153–62. USENIX Association, Berkeley,CA, 1992.
[131] Mikio Yamamoto and Kenneth Ward Church. Using suffix arrays to compute term fre-quency and document frequency for all substrings in a corpus. Computational Linguis-tics, 27(1):1–30, 2001.
BIBLIOGRAPHY 165
[132] J Yang, W Wang, P Yu, and J Han. Mining long sequential patterns in a noisy environ-ment. In SIGMOD. ACM, 2002.
[133] Jacob Ziv and Abraham Lempel. A universal algorithm for sequential data compression.IEEE Transactions on Information Theory, 23(3):337–343, 1977.
[134] Jacob Ziv and Abraham Lempel. Compression of individual sequences via variable-ratecoding. IEEE Transactions on Information Theory, 24(5):530–536, 1978.