Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
RESEARCH POSTER PRESENTATION DESIGN © 2012
www.PosterPresentations.com
(—THIS SIDEBAR DOES NOT PRINT—)
DES IGN GUIDE
This PowerPoint 2007 template produces a 44”x44”
presentation poster. You can use it to create your research
poster and save valuable time placing titles, subtitles, text,
and graphics.
We provide a series of online tutorials that will guide you
through the poster design process and answer your poster
production questions. To view our template tutorials, go
online to PosterPresentations.com and click on HELP DESK.
When you are ready to print your poster, go online to
PosterPresentations.com
Need assistance? Call us at 1.510.649.3001
QUICK START
Zoom in and out As you work on your poster zoom in and out to the
level that is more comfortable to you. Go to VIEW >
ZOOM.
Title, Authors, and Affiliations Start designing your poster by adding the title, the names of the
authors, and the affiliated institutions. You can type or paste text
into the provided boxes. The template will automatically adjust the
size of your text to fit the title box. You can manually override this
feature and change the size of your text.
TIP: The font size of your title should be bigger than your name(s)
and institution name(s).
Adding Logos / Seals Most often, logos are added on each side of the title. You can insert
a logo by dragging and dropping it from your desktop, copy and
paste or by going to INSERT > PICTURES. Logos taken from web sites
are likely to be low quality when printed. Zoom it at 100% to see
what the logo will look like on the final poster and make any
necessary adjustments.
TIP: See if your school’s logo is available on our free poster
templates page.
Photographs / Graphics You can add images by dragging and dropping from your desktop,
copy and paste, or by going to INSERT > PICTURES. Resize images
proportionally by holding down the SHIFT key and dragging one of
the corner handles. For a professional-looking poster, do not distort
your images by enlarging them disproportionally.
Image Quality Check Zoom in and look at your images at 100% magnification. If they look
good they will print well. If they are blurry or pixelated, you will
need to replace it with an image that is at a high-resolution.
ORIGINAL DISTORTED
Corner handles
Go
od
pri
nti
ng
qu
alit
y
Bad
pri
nti
ng
qu
alit
y
QUICK START (cont. )
How to change the template color theme You can easily change the color theme of your poster by going to
the DESIGN menu, click on COLORS, and choose the color theme of
your choice. You can also create your own color theme.
You can also manually change the color of your background by going
to VIEW > SLIDE MASTER. After you finish working on the master be
sure to go to VIEW > NORMAL to continue working on your poster.
How to add Text The template comes with a number of pre-
formatted placeholders for headers and text
blocks. You can add more blocks by copying
and pasting the existing ones or by adding a
text box from the HOME menu.
Text size Adjust the size of your text based on how much content you have to
present. The default template text offers a good starting point.
Follow the conference requirements.
How to add Tables To add a table from scratch go to the INSERT menu and
click on TABLE. A drop-down box will help you select
rows and columns.
You can also copy and a paste a table from Word or
another PowerPoint document. A pasted table may need
to be re-formatted by RIGHT-CLICK > FORMAT SHAPE,
TEXT BOX, Margins.
Graphs / Charts You can simply copy and paste charts and graphs from Excel or
Word. Some reformatting may be required depending on how the
original document has been created.
How to change the column configuration RIGHT-CLICK on the poster background and select LAYOUT to see
the column options available for this template. The poster columns
can also be customized on the Master. VIEW > MASTER.
How to remove the info bars If you are working in PowerPoint for Windows and have finished your
poster, save as PDF and the bars will not be included. You can also
delete them by going to VIEW > MASTER. On the Mac adjust the
Page-Setup to match the Page-Setup in PowerPoint before you
create a PDF. You can also delete them from the Slide Master.
Save your work Save your template as a PowerPoint document. For printing, save as
PowerPoint of “Print-quality” PDF.
Print your poster When you are ready to have your poster printed go online to
PosterPresentations.com and click on the “Order Your Poster”
button. Choose the poster type the best suits your needs and submit
your order. If you submit a PowerPoint document you will be
receiving a PDF proof for your approval prior to printing. If your
order is placed and paid for before noon, Pacific, Monday through
Friday, your order will ship out that same day. Next day, Second day,
Third day, and Free Ground services are offered. Go to
PosterPresentations.com for more information.
Student discounts are available on our Facebook page.
Go to PosterPresentations.com and click on the FB icon.
© 2013 PosterPresentations.com 2117 Fourth Street , Unit C Berkeley CA 94710
Motivation: Similarity based methods have been widely used in order to
infer the properties of genes and gene products containing little or no
experimental annotation. The most popular ones are the sequence
similarity search methods such as BLAST. New approaches that overcome
the limitations of the methods that relying solely upon sequence similarity
are rising. One of these novel approaches is the comparison of the
organization/architecture of the structural domains in the proteins. The
idea is that the shared structural units may indicate shared evolutionary
and functional properties associated between these units.
Results: Here we propose a new algorithm for the comparison of domain
architectures in order to identify similarities and to propagate functional
annotations between the proteins in the UniProt Database. The method
“UniProt Domain Architecture Alignment” is unique from previous
approaches in three major ways: (i) the use of InterPro Database for the
domain annotation, (ii) the incorporation of the domain weights into the
dynamic programming step, and (iii) the inclusion of information regarding
non-annotated regions in the proteins into the domain architectures. The
performance of the method was measured through the identification of
orthology using the OMA database (F1 score: 0.62). The results indicated
the effectiveness of the approach for similarity detection. We plan to
integrate the algorithm into a learning based system for the automatic
annotation of uncharacterized proteins in the UniProtKB/TrEMBL database.
ABSTRACT
Generation of the Domain Architectures:
1) Collect the hits for each protein from InterPro.
2) Remove all non-domain type hits.
3) Order the domain hits sequentially.
4) Merge the hits from the same InterPro hierarchy into single hits using
the condensed view algorithm provided by this resource.
5) Treat the overlapping hits from unrelated InterPro entries.
6) Add the stretches of residues without domain hits (> 30 a.a.) as “GAP”
domains in the DAs.
Domain weighting:
Inverse domain frequency:
Neighboring domain count:
Term frequency:
Domain hit sizes:
Domain similarity measure:
Weight matrix:
Final scoring matrix:
Weighted Domain Architecture Alignment:
Needleman-Wunsch Global Sequence Alignment algorithm (Needleman and
Wunsch, 1970) is the core of the proposed DA alignment method:
• Modification of the algorithm in order to work with 7137 distinct
InterPro domains as its alphabet instead of 20 amino acids.
• Integration of the domain weights into a scoring matrix in order to
direct the alignments to achieve maximum weighted scores.
• Scoring of the non-annotated regions on proteins as gaps during the
alignment.
METHODOLOGY RESULTS & DISCUSSION
InterPro Domains, DAs and DA Alignment
Domain annotation coverage
difference b/w domain databases:
Statistics about the directionality in DAs:
Evaluation of the performance of the method
The performance of the proposed method in identification of orthologous
protein sequences proteins from Orthologous Matrix project (OMA) release
March 2014 (Altenhoff, et al., 2011).
The randomly selected UniProtKB/SwissProt proteins from the OMA groups
were subjected to the DA alignment procedure.
The performance of the method was evaluated by measuring its ability to
identify the orthologous proteins as orthologs usually share the same
function.
CONCLUSIONS
Here we proposed a new approach in the field of protein function
prediction. The method is distinguished from all previous approaches in
three main aspects:
1) Different types of domain weights are integrated into the scoring
matrix to direct the alignment of DAs to an optimal solution.
2) The information pertaining to the non-annotated regions of the
proteins are integrated into DAs and thus scored during the alignment.
3) InterPro is used as the domain resource in order to increase the
coverage of domain annotation on the protein sequences.
The results of the ortholog sequence analysis suggest that the proposed
approach can identify the functional relationships between proteins.
As future work, we are planning to use the method in a pipeline for the
automatic annotation of the proteins in the UniPRotKB/TrEMBL database.
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Correspondence: [email protected]
Tunca Doğan1, Alex Bateman1, Maria J. Martin1
UniProt Domain Architecture Alignment: A New Approach for Protein Similarity Search using InterPro Domain Annotation
INTRODUCTION
• Discovery of functional properties for proteins is a key step in
biomedical research.
• Experimental identification of proteins is still a quite laborious and
expensive task.
• This led to many computational methods being developed to infer the
unknown properties of the proteins based on their sequence similarities
to experimentally annotated proteins (i.e. BLAST, PSI-BLAST).
• Different approaches have been tried lately, especially in the field of
protein function prediction, to augment the performance of sequence
methods.
• One of these approaches is the study of protein domains: the structural
building blocks in proteins that are able to function and fold
independently from the rest of the protein.
• The concept of domain architectures (DA), defined as the
organizational properties of a protein regarding the domains it
contains.
• Here we present the UniProt Domain Architecture Alignment procedure
for the detection of functional similarities between proteins containing
domain annotation:
Four types of attributes are incorporated into the measurement of
the domain architectural similarities: domain content, order,
position and recurrence.
The proposed method incorporates domain information from the
InterPro database in order to increase the domain information
coverage on the proteins.
Figure 1. Different types of overlapping domain hits on protein sequences
Figure 2. Resolution process for the overlap hits.
Figure 3. Domain hit statistics of UniProtKB/SwissProt proteins from various databases
Figure 4. The fraction of overlap hits by InterPro domains on the residues of all UniProtKB/SwissProt proteins
Overlap domain hits problem in
the InterPro database:
Figure 5. Co-occurrence frequencies of a selection of domain pairs, hit together on UniProtKB/SwissProt proteins (InterPro accessions of the domains are shown at the top of the bars).
Table 1. Performance results of the proposed method in the identification of orthologous proteins in OMA groups.
ACKNOWLEDGEMENTS
T.D. thanks Andrew Nightingale for the editorial work on the manuscript.
Funding: This work was supported by TUBITAK BIDEB-2219 post-doctoral
research fellowship program.
Nt : total number of proteins in the test set Nd : number of proteins containing domain d
Ed : total number of distinct neighboring domains to d
Nd,p : domain copy number of domain d in protein p Dp : total number of domains in protein p
Zmin(d1,d2) & Zmax(d1,d2) : sizes of the shorter and longer hits respectively; of domain d in protein 1 and in protein 2 Zav : average size of all domain hits on all proteins in the set
Od,e : similarity ratio between domain d and domain e
Ap1,p2, Cp1,p2, Fp1,p2, Sp1,p2 & Ip1,p2 : local weight matrices
Rp1,p2 : raw scoring matrix Wp1,p2 : general weight matrix between proteins 1 and 2
F1000
Pos
ters:
Use P
ermitte
d und
er Crea
tive C
ommon
s Lice
nse.
F1000
Pos
ters:
Use P
ermitte
d und
er Crea
tive C
ommon
s Lice
nse.
F1000
Pos
ters:
Use P
ermitte
d und
er Crea
tive C
ommon
s Lice
nse.
F1000
Pos
ters:
Use P
ermitte
d und
er Crea
tive C
ommon
s Lice
nse.
F1000
Pos
ters:
Use P
ermitte
d und
er Crea
tive C
ommon
s Lice
nse.
F1000
Pos
ters:
Use P
ermitte
d und
er Crea
tive C
ommon
s Lice
nse.
F1000
Pos
ters:
Use P
ermitte
d und
er Crea
tive C
ommon
s Lice
nse.
F1000
Pos
ters:
Use P
ermitte
d und
er Crea
tive C
ommon
s Lice
nse.
F1000
Pos
ters:
Use P
ermitte
d und
er Crea
tive C
ommon
s Lice
nse.
F1000
Pos
ters:
Use P
ermitte
d und
er Crea
tive C
ommon
s Lice
nse.
F1000
Pos
ters:
Use P
ermitte
d und
er Crea
tive C
ommon
s Lice
nse.
F1000
Pos
ters:
Use P
ermitte
d und
er Crea
tive C
ommon
s Lice
nse.
F1000
Pos
ters:
Use P
ermitte
d und
er Crea
tive C
ommon
s Lice
nse.
F1000
Pos
ters:
Use P
ermitte
d und
er Crea
tive C
ommon
s Lice
nse.
F1000
Pos
ters:
Use P
ermitte
d und
er Crea
tive C
ommon
s Lice
nse.
F1000
Pos
ters:
Use P
ermitte
d und
er Crea
tive C
ommon
s Lice
nse.
F1000
Pos
ters:
U