10
Identification of Domains using Structural Data Niranjan Nagarajan Department of Computer Science Cornell University

Identification of Domains using Structural Data

Embed Size (px)

DESCRIPTION

Identification of Domains using Structural Data. Niranjan Nagarajan Department of Computer Science Cornell University. Assorted Definitions of Domains. Subsequences that can fold independently into a stable structure. Structurally compact substructures. - PowerPoint PPT Presentation

Citation preview

Page 1: Identification of Domains using Structural Data

Identification of Domains using Structural Data

Niranjan Nagarajan

Department of Computer Science

Cornell University

Page 2: Identification of Domains using Structural Data

Assorted Definitions of Domains

• Subsequences that can fold independently into a stable structure.

• Structurally compact substructures.

• Functionally well-defined building blocks.

• Evolutionarily conserved and reused fragments.

Page 3: Identification of Domains using Structural Data

Protein Structural Domain Identification

William R. Taylor

Page 4: Identification of Domains using Structural Data

Basic Algorithm

• Initial Assignment of Labels– Sequential residue numbering

• Update of Labels

• Termination Condition– Mean squared deviation of average between

successive cycles < 10^-6 or number of iterations > (length of protein)/2

Page 5: Identification of Domains using Structural Data

Update Formula

• Sit+1 = Si

t + step(t+1)*sign(jf(Sit, Sj

t)) i.• sign(x) = 1 if x > 0, -1 if x < 0, 0 if x = 0.• f(Si

t, Sjt) =

– r/dij if Sjt > Si

t and dij < r.– -r/dij if Sj

t < Sit and dij < r.

– 0 otherwise.

• Step(x) = – 1 if x < N/2. – 2(N-x)/N if N/2 <= x < N. – 0 otherwise.

Page 6: Identification of Domains using Structural Data

Example

• Full lines indicate protein backbone.• Neighboring residues within radius r are connected by

dashed lines. • Connections between i and i + 2 have been omitted for

clarity.• Label evolution is done without inverse distance

weighting.

Page 7: Identification of Domains using Structural Data

Refinements

• Median based smoothing with a window size of 21 to reclaim short loops of 10 or less residues.

• Small domains reassigned by using the weighted mean values of its neighbors (weights are given using f.)

• Domain recalculation repeated for at most five times.

Page 8: Identification of Domains using Structural Data

Preserving -sheets

• Matrix B of possible -sheet interactions between residues generated based on distance data and heuristics.

• Weighted mean heuristic used to generate initial assignment of labels with the averaging being iterated to convergence.

• Post-processing also done to badly broken -sheets.

Page 9: Identification of Domains using Structural Data

Self-testing with fake homologs

• Fake homologs generated by smoothing– Replacing central atom of triple by average.– Process repeated five times.

• Domain assignments compared and similarity evaluated based on overlap score.

• r optimized for best overlap score.

Page 10: Identification of Domains using Structural Data

Extension to Multiple Structures

• Algorithm is simultaneously run on structures corresponding to a multiple sequence alignment.

• Labels are synchronized to the average of the labels at a position after each iteration.