11
BalticGrid-II Project 2nd BG-II AHM, 13.05.2009, Riga, Latvia 1 Overview of application CoPS (Comparison of Protein Structures) D.Ludviga D.Ludviga IMCS UL (SigmaNet) IMCS UL (SigmaNet)

BalticGrid-II Project 2nd BG-II AHM, 13.05.2009, Riga, Latvia1 Overview of application CoPS (Comparison of Protein Structures) D.Ludviga IMCS UL (SigmaNet)

Embed Size (px)

Citation preview

Page 1: BalticGrid-II Project 2nd BG-II AHM, 13.05.2009, Riga, Latvia1 Overview of application CoPS (Comparison of Protein Structures) D.Ludviga IMCS UL (SigmaNet)

BalticGrid-II Project

2nd BG-II AHM, 13.05.2009, Riga, Latvia 1

Overview of application CoPS (Comparison of Protein Structures)

D.LudvigaD.LudvigaIMCS UL (SigmaNet)IMCS UL (SigmaNet)

Page 2: BalticGrid-II Project 2nd BG-II AHM, 13.05.2009, Riga, Latvia1 Overview of application CoPS (Comparison of Protein Structures) D.Ludviga IMCS UL (SigmaNet)

2nd BG-II AHM, 13.05.2009, Riga, Latvia 2

Outline

About CoPSAbout CoPS ( (scientific value);scientific value);

What's new?;What's new?;

Challenges (mentioned during 1AHM);Challenges (mentioned during 1AHM);

Our solution;Our solution;

Collaboration possibilities.Collaboration possibilities.

Page 3: BalticGrid-II Project 2nd BG-II AHM, 13.05.2009, Riga, Latvia1 Overview of application CoPS (Comparison of Protein Structures) D.Ludviga IMCS UL (SigmaNet)

2nd BG-II AHM, 13.05.2009, Riga, Latvia 3

About CoPS (scientific value)

Started at the beginning of BG-II as the Started at the beginning of BG-II as the pilot applicationpilot application;; developed by Dr. Natalja Kurbatova and Asoc. Prof. Juris Viksna

Field – Bioinformatics;Field – Bioinformatics;

“It has taken biologists some 230 years to identify and describe three quarters of a million insects; if there are indeed at least thirty million ... then, working as they have in the past, insect taxonomists have ten thousand years of employment ahead

of them.”

R.Leakey and L.RogerR.Leakey and L.Roger

Page 4: BalticGrid-II Project 2nd BG-II AHM, 13.05.2009, Riga, Latvia1 Overview of application CoPS (Comparison of Protein Structures) D.Ludviga IMCS UL (SigmaNet)

2nd BG-II AHM, 13.05.2009, Riga, Latvia

About CoPS

Assumption - protein structures have evolved by a stepwise process, Assumption - protein structures have evolved by a stepwise process, each step involving a small change in the structure.each step involving a small change in the structure.

Comparison of protein structures using Evolutionary Secondary Comparison of protein structures using Evolutionary Secondary Structures Matching (Structures Matching (ESSMESSM) algorithm) algorithm ESSM was created for pair wise comparison of structures that allow to

identify fold mutations and to estimate evolutionary relationship between proteins.

For For exploration of evolutionexploration of evolution of protein structuresof protein structures all-against-all all-against-all comparison have to be donecomparison have to be done

Application needs:Application needs: Protein data base (data set description files are stored)

– PDB (3D), FASTA (.txt), structural elements; – size ~8 GB (~2.3GB if compressed);

Total number of tasks - 20 451 945, divided in 410 files

Page 5: BalticGrid-II Project 2nd BG-II AHM, 13.05.2009, Riga, Latvia1 Overview of application CoPS (Comparison of Protein Structures) D.Ludviga IMCS UL (SigmaNet)

2nd BG-II AHM, 13.05.2009, Riga, Latvia

About CoPS

Application consists of:Application consists of: jdl.essm - JDL file for submitting ESSM (CoPS) job essm.sh - shell script that is executed on WN once the job starts database.tar.gz - archive of the protein database with protein

descriptions, which is extracted on the WN before anything else starts

essm.linux - statically compiled executable for ESSM(CoPS) that works on Scientific Linux [CERN] 4, 32-bit binary

pairs.txt - sample calculation file that contains pair comparisons At the end of each job result file pairs.result is generated

Afterwards visualized using a self made tool.Afterwards visualized using a self made tool. developed using one of GRADE components

Page 6: BalticGrid-II Project 2nd BG-II AHM, 13.05.2009, Riga, Latvia1 Overview of application CoPS (Comparison of Protein Structures) D.Ludviga IMCS UL (SigmaNet)

2nd BG-II AHM, 13.05.2009, Riga, Latvia 6

About CoPS

Page 7: BalticGrid-II Project 2nd BG-II AHM, 13.05.2009, Riga, Latvia1 Overview of application CoPS (Comparison of Protein Structures) D.Ludviga IMCS UL (SigmaNet)

2nd BG-II AHM, 13.05.2009, Riga, Latvia

Whats new?

DevelopedDeveloped (results received); (results received); ~2 weeks.

Implemented in Migrating Desktop;Implemented in Migrating Desktop;

Presented/demonstrated on OGF25/EGEE Users Presented/demonstrated on OGF25/EGEE Users Forum in Catania, ItalyForum in Catania, Italy

DemoDemo

Page 8: BalticGrid-II Project 2nd BG-II AHM, 13.05.2009, Riga, Latvia1 Overview of application CoPS (Comparison of Protein Structures) D.Ludviga IMCS UL (SigmaNet)

2nd BG-II AHM, 13.05.2009, Riga, Latvia

Challenges and our solution

Challenges: Challenges: Transport the data;

– 410 x 2.3GB ≈ 950GB VOMS-proxy.

SolutionsSolutions The needed data was installed on separate clusters

software directories (developed “devoted” protein clusters)

Myproxy

Page 9: BalticGrid-II Project 2nd BG-II AHM, 13.05.2009, Riga, Latvia1 Overview of application CoPS (Comparison of Protein Structures) D.Ludviga IMCS UL (SigmaNet)

2nd BG-II AHM, 13.05.2009, Riga, Latvia

Results

The results of the ESSM algorithm were successfully used for the exploration of the CATH fold space by using fold space graphs for representation of comparison results and estimation of "evolution distance" on the basis of observed changes.

The results obtained in the application can be The results obtained in the application can be represented as a few steps toward the creation of represented as a few steps toward the creation of an an general protein evolution modelgeneral protein evolution model..

Page 10: BalticGrid-II Project 2nd BG-II AHM, 13.05.2009, Riga, Latvia1 Overview of application CoPS (Comparison of Protein Structures) D.Ludviga IMCS UL (SigmaNet)

2nd BG-II AHM, 13.05.2009, Riga, Latvia

Collaboration

““Computer science is no more about computers than Computer science is no more about computers than astronomy is about telescopesastronomy is about telescopes””

E.W.Dijkstra

Continue collaboration with biologists in LUContinue collaboration with biologists in LU;;

Develop an VO or just devoted serversDevelop an VO or just devoted servers:: PDB can be installed on a clusters VO software directory

– To speed up execution of jobs and avoid per-job download and extraction of these databases.

Page 11: BalticGrid-II Project 2nd BG-II AHM, 13.05.2009, Riga, Latvia1 Overview of application CoPS (Comparison of Protein Structures) D.Ludviga IMCS UL (SigmaNet)

2nd BG-II AHM, 13.05.2009, Riga, Latvia

Thank you!