Identification of Domains using Structural Data

Preview:

DESCRIPTION

Identification of Domains using Structural Data. Niranjan Nagarajan Department of Computer Science Cornell University. Assorted Definitions of Domains. Subsequences that can fold independently into a stable structure. Structurally compact substructures. - PowerPoint PPT Presentation

Citation preview

Identification of Domains using Structural Data

Niranjan Nagarajan

Department of Computer Science

Cornell University

Assorted Definitions of Domains

• Subsequences that can fold independently into a stable structure.

• Structurally compact substructures.

• Functionally well-defined building blocks.

• Evolutionarily conserved and reused fragments.

Protein Structural Domain Identification

William R. Taylor

Basic Algorithm

• Initial Assignment of Labels– Sequential residue numbering

• Update of Labels

• Termination Condition– Mean squared deviation of average between

successive cycles < 10^-6 or number of iterations > (length of protein)/2

Update Formula

• Sit+1 = Si

t + step(t+1)*sign(jf(Sit, Sj

t)) i.• sign(x) = 1 if x > 0, -1 if x < 0, 0 if x = 0.• f(Si

t, Sjt) =

– r/dij if Sjt > Si

t and dij < r.– -r/dij if Sj

t < Sit and dij < r.

– 0 otherwise.

• Step(x) = – 1 if x < N/2. – 2(N-x)/N if N/2 <= x < N. – 0 otherwise.

Example

• Full lines indicate protein backbone.• Neighboring residues within radius r are connected by

dashed lines. • Connections between i and i + 2 have been omitted for

clarity.• Label evolution is done without inverse distance

weighting.

Refinements

• Median based smoothing with a window size of 21 to reclaim short loops of 10 or less residues.

• Small domains reassigned by using the weighted mean values of its neighbors (weights are given using f.)

• Domain recalculation repeated for at most five times.

Preserving -sheets

• Matrix B of possible -sheet interactions between residues generated based on distance data and heuristics.

• Weighted mean heuristic used to generate initial assignment of labels with the averaging being iterated to convergence.

• Post-processing also done to badly broken -sheets.

Self-testing with fake homologs

• Fake homologs generated by smoothing– Replacing central atom of triple by average.– Process repeated five times.

• Domain assignments compared and similarity evaluated based on overlap score.

• r optimized for best overlap score.

Extension to Multiple Structures

• Algorithm is simultaneously run on structures corresponding to a multiple sequence alignment.

• Labels are synchronized to the average of the labels at a position after each iteration.

Recommended