Upload
james-cain
View
212
Download
0
Embed Size (px)
Citation preview
BalticGrid-II Project
2nd BG-II AHM, 13.05.2009, Riga, Latvia 1
Overview of application CoPS (Comparison of Protein Structures)
D.LudvigaD.LudvigaIMCS UL (SigmaNet)IMCS UL (SigmaNet)
2nd BG-II AHM, 13.05.2009, Riga, Latvia 2
Outline
About CoPSAbout CoPS ( (scientific value);scientific value);
What's new?;What's new?;
Challenges (mentioned during 1AHM);Challenges (mentioned during 1AHM);
Our solution;Our solution;
Collaboration possibilities.Collaboration possibilities.
2nd BG-II AHM, 13.05.2009, Riga, Latvia 3
About CoPS (scientific value)
Started at the beginning of BG-II as the Started at the beginning of BG-II as the pilot applicationpilot application;; developed by Dr. Natalja Kurbatova and Asoc. Prof. Juris Viksna
Field – Bioinformatics;Field – Bioinformatics;
“It has taken biologists some 230 years to identify and describe three quarters of a million insects; if there are indeed at least thirty million ... then, working as they have in the past, insect taxonomists have ten thousand years of employment ahead
of them.”
R.Leakey and L.RogerR.Leakey and L.Roger
2nd BG-II AHM, 13.05.2009, Riga, Latvia
About CoPS
Assumption - protein structures have evolved by a stepwise process, Assumption - protein structures have evolved by a stepwise process, each step involving a small change in the structure.each step involving a small change in the structure.
Comparison of protein structures using Evolutionary Secondary Comparison of protein structures using Evolutionary Secondary Structures Matching (Structures Matching (ESSMESSM) algorithm) algorithm ESSM was created for pair wise comparison of structures that allow to
identify fold mutations and to estimate evolutionary relationship between proteins.
For For exploration of evolutionexploration of evolution of protein structuresof protein structures all-against-all all-against-all comparison have to be donecomparison have to be done
Application needs:Application needs: Protein data base (data set description files are stored)
– PDB (3D), FASTA (.txt), structural elements; – size ~8 GB (~2.3GB if compressed);
Total number of tasks - 20 451 945, divided in 410 files
2nd BG-II AHM, 13.05.2009, Riga, Latvia
About CoPS
Application consists of:Application consists of: jdl.essm - JDL file for submitting ESSM (CoPS) job essm.sh - shell script that is executed on WN once the job starts database.tar.gz - archive of the protein database with protein
descriptions, which is extracted on the WN before anything else starts
essm.linux - statically compiled executable for ESSM(CoPS) that works on Scientific Linux [CERN] 4, 32-bit binary
pairs.txt - sample calculation file that contains pair comparisons At the end of each job result file pairs.result is generated
Afterwards visualized using a self made tool.Afterwards visualized using a self made tool. developed using one of GRADE components
2nd BG-II AHM, 13.05.2009, Riga, Latvia 6
About CoPS
2nd BG-II AHM, 13.05.2009, Riga, Latvia
Whats new?
DevelopedDeveloped (results received); (results received); ~2 weeks.
Implemented in Migrating Desktop;Implemented in Migrating Desktop;
Presented/demonstrated on OGF25/EGEE Users Presented/demonstrated on OGF25/EGEE Users Forum in Catania, ItalyForum in Catania, Italy
DemoDemo
2nd BG-II AHM, 13.05.2009, Riga, Latvia
Challenges and our solution
Challenges: Challenges: Transport the data;
– 410 x 2.3GB ≈ 950GB VOMS-proxy.
SolutionsSolutions The needed data was installed on separate clusters
software directories (developed “devoted” protein clusters)
Myproxy
2nd BG-II AHM, 13.05.2009, Riga, Latvia
Results
The results of the ESSM algorithm were successfully used for the exploration of the CATH fold space by using fold space graphs for representation of comparison results and estimation of "evolution distance" on the basis of observed changes.
The results obtained in the application can be The results obtained in the application can be represented as a few steps toward the creation of represented as a few steps toward the creation of an an general protein evolution modelgeneral protein evolution model..
2nd BG-II AHM, 13.05.2009, Riga, Latvia
Collaboration
““Computer science is no more about computers than Computer science is no more about computers than astronomy is about telescopesastronomy is about telescopes””
E.W.Dijkstra
Continue collaboration with biologists in LUContinue collaboration with biologists in LU;;
Develop an VO or just devoted serversDevelop an VO or just devoted servers:: PDB can be installed on a clusters VO software directory
– To speed up execution of jobs and avoid per-job download and extraction of these databases.
2nd BG-II AHM, 13.05.2009, Riga, Latvia
Thank you!