Upload
regan-kline
View
19
Download
0
Embed Size (px)
DESCRIPTION
Identification of Domains using Structural Data. Niranjan Nagarajan Department of Computer Science Cornell University. Assorted Definitions of Domains. Subsequences that can fold independently into a stable structure. Structurally compact substructures. - PowerPoint PPT Presentation
Citation preview
Identification of Domains using Structural Data
Niranjan Nagarajan
Department of Computer Science
Cornell University
Assorted Definitions of Domains
• Subsequences that can fold independently into a stable structure.
• Structurally compact substructures.
• Functionally well-defined building blocks.
• Evolutionarily conserved and reused fragments.
Protein Structural Domain Identification
William R. Taylor
Basic Algorithm
• Initial Assignment of Labels– Sequential residue numbering
• Update of Labels
• Termination Condition– Mean squared deviation of average between
successive cycles < 10^-6 or number of iterations > (length of protein)/2
Update Formula
• Sit+1 = Si
t + step(t+1)*sign(jf(Sit, Sj
t)) i.• sign(x) = 1 if x > 0, -1 if x < 0, 0 if x = 0.• f(Si
t, Sjt) =
– r/dij if Sjt > Si
t and dij < r.– -r/dij if Sj
t < Sit and dij < r.
– 0 otherwise.
• Step(x) = – 1 if x < N/2. – 2(N-x)/N if N/2 <= x < N. – 0 otherwise.
Example
• Full lines indicate protein backbone.• Neighboring residues within radius r are connected by
dashed lines. • Connections between i and i + 2 have been omitted for
clarity.• Label evolution is done without inverse distance
weighting.
Refinements
• Median based smoothing with a window size of 21 to reclaim short loops of 10 or less residues.
• Small domains reassigned by using the weighted mean values of its neighbors (weights are given using f.)
• Domain recalculation repeated for at most five times.
Preserving -sheets
• Matrix B of possible -sheet interactions between residues generated based on distance data and heuristics.
• Weighted mean heuristic used to generate initial assignment of labels with the averaging being iterated to convergence.
• Post-processing also done to badly broken -sheets.
Self-testing with fake homologs
• Fake homologs generated by smoothing– Replacing central atom of triple by average.– Process repeated five times.
• Domain assignments compared and similarity evaluated based on overlap score.
• r optimized for best overlap score.
Extension to Multiple Structures
• Algorithm is simultaneously run on structures corresponding to a multiple sequence alignment.
• Labels are synchronized to the average of the labels at a position after each iteration.