Identification of Domains using Structural Data

Niranjan Nagarajan

Department of Computer Science

Cornell University

Assorted Definitions of Domains

• Subsequences that can fold independently into a stable structure.

• Structurally compact substructures.

• Functionally well-defined building blocks.

• Evolutionarily conserved and reused fragments.

Protein Structural Domain Identification

William R. Taylor

Basic Algorithm

• Initial Assignment of Labels– Sequential residue numbering

• Update of Labels

• Termination Condition– Mean squared deviation of average between

successive cycles < 10^-6 or number of iterations > (length of protein)/2

Update Formula

• Sit+1 = Si

t + step(t+1)*sign(jf(Sit, Sj

t)) i.• sign(x) = 1 if x > 0, -1 if x < 0, 0 if x = 0.• f(Si

t, Sjt) =

– r/dij if Sjt > Si

t and dij < r.– -r/dij if Sj

t < Sit and dij < r.

– 0 otherwise.

• Step(x) = – 1 if x < N/2. – 2(N-x)/N if N/2 <= x < N. – 0 otherwise.

Example

• Full lines indicate protein backbone.• Neighboring residues within radius r are connected by

dashed lines. • Connections between i and i + 2 have been omitted for

clarity.• Label evolution is done without inverse distance

weighting.

Refinements

• Median based smoothing with a window size of 21 to reclaim short loops of 10 or less residues.

• Small domains reassigned by using the weighted mean values of its neighbors (weights are given using f.)

• Domain recalculation repeated for at most five times.

Preserving -sheets

• Matrix B of possible -sheet interactions between residues generated based on distance data and heuristics.

• Weighted mean heuristic used to generate initial assignment of labels with the averaging being iterated to convergence.

• Post-processing also done to badly broken -sheets.

Self-testing with fake homologs

• Fake homologs generated by smoothing– Replacing central atom of triple by average.– Process repeated five times.

• Domain assignments compared and similarity evaluated based on overlap score.

• r optimized for best overlap score.

Extension to Multiple Structures

• Algorithm is simultaneously run on structures corresponding to a multiple sequence alignment.

• Labels are synchronized to the average of the labels at a position after each iteration.

Identification of Domains using Structural Data

Documents

Identification of Structural Requirement of Estrogen

INVERSE PROBLEMS IN STRUCTURAL DAMAGE IDENTIFICATION ... · Inverse Problems in Structural Damage Identification, Structural Optimization, and Optical Medical Imaging Using Artificial

Interferometric techniques in structural damage identification

DNA Binding Domains: Effector Domain Structural Motifsspine.rutgers.edu/molecular/lectures/Regulated_Transcription_Factors.pdf · DNA Binding Domains: Structural Motifs ... DBD (Basic

Detection and Structural Identification of Dissolved ...arctic.eas.ualberta.ca/downloads/Pautler et al., 2011, EST.pdf · Detection and Structural Identification of Dissolved

Structural arrangements of transcription control domains

Research Article Structural Parameter Identification of ...downloads.hindawi.com/journals/mpe/2016/4063046.pdf · Research Article Structural Parameter Identification of Articulated

Structural basis for recruitment of tandem hotdog domains in acyl

STRUCTURAL ELEMENT STIFFNESS IDENTIFICATION FROM STATIC ...engineering.tufts.edu/cee/people/sanayei/documents/1991_Structural... · STRUCTURAL ELEMENT STIFFNESS IDENTIFICATION FROM

Identification of Protein Domains Eden Dror Menachem Schechter Computational Biology Seminar 2004

Protein domains. Protein domains are structural units (average 160 aa) that share: Function Folding Evolution Proteins normally are multidomain (average

Structural Identification and Parametric Estimation of

Identification and primary structure of calmodulin binding domains in

THE GEOLOGICAL CIRCUIT OF THE FOUR STRUCTURAL DOMAINS: THE ... · THE GEOLOGICAL CIRCUIT OF THE FOUR STRUCTURAL DOMAINS: THE RIF, THE MESETA, THE ATLASES, AND THE ANTI-ATLAS ... between

Identification of epitaxial graphene domains and adsorbed ...liu.diva-portal.org/smash/get/diva2:566636/FULLTEXT01.pdf · Identification of epitaxial graphene domains and adsorbed

Learning Paraphrase Identification with Structural ... - IJCAI

Crystal Structure of NC1 domains: Structural Basis for Type IV

Structural studies on domains in proximity to titin kinase and

Identification of the Erythrocyte Binding Domains of …repository.ias.ac.in/5552/1/308.pdf · Identification of the Erythrocyte Binding Domains of Plasmoch'um vivax and Plasmodium

Structural identification of vector a utoregressions