Upload
bosc-2010
View
329
Download
0
Tags:
Embed Size (px)
Citation preview
Connecting TOPSAN to Computational Analysis
Christian M Zmasek, Kyle Ellrott, Dana Weekes, Constantina Bakolitsa, John Wooley, Adam Godzik
Joint Center for Structural GenomicsSanford-Burnham Medical Research Institute, La Jolla, California, USA
University of California, San Diego, La Jolla, California, USAJoint Center for Molecular Modeling
Connecting TOPSAN to Computational Analysis 2
Overview
• What is TOPSAN?– TOPSAN: The Open Protein Structure Annotation Network – community based annotation protein structures
• “Semantic” TOPSAN• How to enter machine-readable, structured data• Example: editor → entry → semantic web• Different ways to download information• SPARQL example• Availability and licenses• Acknowledgements
Connecting TOPSAN to Computational Analysis 3
What is TOPSAN?
• TOPSAN: The Open Protein Structure Annotation Network • Ten-thousands of protein structures have been determined
by structural genomics (SG) centers and many more are expected
• While these structures are available in PDB (Protein Data Bank)…
• … annotations for most of them a limited to one-line PDB titles
• TOPSAN is the first database that specifically focuses on proving extensive annotations for the thousands of structures solved by the SG centers
Connecting TOPSAN to Computational Analysis 4
What is TOPSAN?
• TOPSAN’s main content are collaboratively (“open”) written articles/annotations for each solved protein structure
• TOPSAN combines automated with human edited elements • TOPSAN spans the range of analysis of
– single proteins– characterization of protein families– reconstruction of entire genomes
• Articles are created by structural genomics (SG) center staff and over 400 external users, so far covering 7,250 proteins
• Collaborating with PFAM to use JCSG structures to refine and create new PFAM families
5
TOPSAN example entry
Connecting TOPSAN to Computational Analysis
Connecting TOPSAN to Computational Analysis 6
“Semantic” TOPSAN
• Use the principles of the semantic web to turn TOPSAN into a database that can be:– edited– searched– linked
• TOPSAN content is being made accessible to computational query and analysis via semantic web technologies
Connecting TOPSAN to Computational Analysis 7
Entering machine-readable, structured data with the TOPSAN Protein Syntax (TPS)
• Takes the form subject, predicate, object• Subject: the protein in question• Predicate, examples:
– homologous– encoded_by– citation– member_of
• Object: “direct value” or link to other database• Example:
– {{ note.link( ‘pfam_family_member’, ‘PFAM:PF07980′ ) }}
• More information: http://topsan.wordpress.com/2010/06/01/96/
Connecting TOPSAN to Computational Analysis 8
Example: in the Editor
Connecting TOPSAN to Computational Analysis 9
Example: the resulting TOPSAN entry
Connecting TOPSAN to Computational Analysis 10
Example: on the Semantic Web
<http://purl.org/topsan/protein/2qcv> <http://purl.org/topsan/tps#simular_structure> <http://www.pdb.org/pdb/explore/explore.do?structureId=2afb>
<http://purl.org/topsan/protein/2qcv> <http://purl.org/topsan/tps#simular_structure> <http://www.pdb.org/pdb/explore/explore.do?structureId=2var>
<http://purl.org/topsan/protein/2qcv> <http://purl.org/topsan/tps#functional_assignment> <http://purl.org/obo/owl/EC#EC_2.7.1.45>
Connecting TOPSAN to Computational Analysis 11
Different ways to download information
• Generic TOPSAN page– Semantic information embedded into every TOPSAN page
• RDFa interface– http://topsan.org/rdfa/2A2M– XML
• Bulk Download– http://files.topsan.org/topsan.n3.gz– All unique semantic triples stored in a single N3 formatted
file
Connecting TOPSAN to Computational Analysis 12
Simple SPARQL
PREFIX tps:<http://purl.org/topsan/tps#>
SELECT ?id ?weight WHERE {
?id tps:molecular_weight ?weight
}
Connecting TOPSAN to Computational Analysis 13
Availability and Licenses
• Project Site: http://www.topsan.org • Software: http://www.topsan.org/Tools • Data: Open Source Licenses: Creative
Commons Attribution 3.0 License• Software: GNU General Public License
Connecting TOPSAN to Computational Analysis 14
Summary
• Structural genomics centers produce a large number of proteins structures, most of which never get a publication
• TOPSAN provides a means for community annotation of such protein structures
• The TOPSAN Protein Syntax (TPS) allows annotators to easily enter machine-readable, structured data
• TOPSAN content is being made accessible to computational query and analysis via semantic web technologies
• Many aspects of TOPSAN are still under development and are planned to evolve with user needs
Connecting TOPSAN to Computational Analysis 15
Acknowledgements
• Inspiration for TOPSAN/semantic web connection: DBCLS BioHackathon 2010
• Developers: Krishna Subramanian, Kyle Ellrott, Dana Weekes
• All contributors and users