33
2 2 nd nd IMPACT workshop IMPACT workshop 5-6 May, 2010 5-6 May, 2010 InterPro InterPro An Introduction An Introduction 07/03/22 1 European Bioinformatics Institute Wellcome Trust Genome Campus

2 nd IMPACT workshop 5-6 May, 2010

  • Upload
    jania

  • View
    31

  • Download
    0

Embed Size (px)

DESCRIPTION

2 nd IMPACT workshop 5-6 May, 2010. InterPro An Introduction. Overview. What InterPro is Where it came from What the vision was Has it evolved in line with that vision? Is it still fit for purpose?. What is InterPro?. According to the User Manual: - PowerPoint PPT Presentation

Citation preview

Page 1: 2 nd  IMPACT workshop 5-6 May, 2010

22ndnd IMPACT workshop IMPACT workshop5-6 May, 20105-6 May, 2010

InterProInterProAn IntroductionAn Introduction

04/20/23 1

European Bioinformatics Institute Wellcome Trust Genome Campus

Page 2: 2 nd  IMPACT workshop 5-6 May, 2010

OverviewOverview

• What InterPro is• Where it came from• What the vision was• Has it evolved in line with that vision?• Is it still fit for purpose?

04/20/23 2

European Bioinformatics Institute Wellcome Trust Genome Campus

Page 3: 2 nd  IMPACT workshop 5-6 May, 2010

What is InterPro?What is InterPro?

According to the User Manual:“InterPro is an integrated documentation resource for

protein families, domains & sites. InterPro combines a number of databases that use different

methodologies & a varying degree of biological information on well-characterised proteins to derive

protein signatures. By uniting the member databases, InterPro capitalises on their individual strengths,

producing a powerful integrated database & diagnostic tool.”

04/20/23 3

European Bioinformatics Institute Wellcome Trust Genome Campus

Page 4: 2 nd  IMPACT workshop 5-6 May, 2010

Where did it come from?Where did it come from?

• The concept of an integrated protein family database emerged almost 20 years ago!– at the 1991 BCA spring meeting in Sheffield

• Amos Bairoch had a poster on PROSITE– I had one on a ‘fingerprint’ database…

• We recognised that our approaches were under-pinned by similar philosophies– to provide meaningful biological information– to provide high quality manual annotation

04/20/23 4

European Bioinformatics Institute Wellcome Trust Genome Campus

Page 5: 2 nd  IMPACT workshop 5-6 May, 2010

04/20/23 5

European Bioinformatics Institute Wellcome Trust Genome Campus

Where did it come from?Where did it come from?

Page 6: 2 nd  IMPACT workshop 5-6 May, 2010

04/20/23 6

European Bioinformatics Institute Wellcome Trust Genome Campus

Where did it come from?Where did it come from?

Page 7: 2 nd  IMPACT workshop 5-6 May, 2010

04/20/23 7

European Bioinformatics Institute Wellcome Trust Genome Campus

Where did it come from?Where did it come from?

Page 8: 2 nd  IMPACT workshop 5-6 May, 2010

• PROSITE & PRINTS were different- but somehow also the same…- most importantly, they were complementary

Where did it come from?Where did it come from?

In combination, we gain powerful structural &

functional insights

In combination, we gain powerful structural &

functional insights

04/20/23 8

European Bioinformatics Institute Wellcome Trust Genome Campus

Page 9: 2 nd  IMPACT workshop 5-6 May, 2010

Where did it come from?Where did it come from?

• So where next?– we had created 30 family fingerprints– PROSITE documented 375 families & functional sites

• PROSITE was way ahead!– we were still on the starting blocks…

• Nevertheless, we decided to apply for an EU grant to unite the databases– …seemed like a good idea at the time!

04/20/23 9

European Bioinformatics Institute Wellcome Trust Genome Campus

Page 10: 2 nd  IMPACT workshop 5-6 May, 2010

What was the vision?What was the vision?

• Naïvely, we wanted to make life easier!• We aimed to

– simplify & rationalise protein family analysis• ensuring that entries & their linked signatures pointed to related

information on the same biological object

– centralise & streamline the annotation process• reduce manual annotation burdens

– facilitate automatic functional annotation of uncharacterised proteins

04/20/23 10

European Bioinformatics Institute Wellcome Trust Genome Campus

Page 11: 2 nd  IMPACT workshop 5-6 May, 2010

How has it evolved?How has it evolved?

• The EU proposal was submitted in 1992– and was promptly declined!

• Later, in 1995, the EBI was established at Hinxton• Visiting Fellowship in 1997

– to help integrate my work more closely with that of EBI

• Rolf, Amos & I decided to try again for an EU grant– by then, Profiles, ProDom & Pfam had also been created– so it made sense to include them too

• With the bigger picture, the grant succeeded- InterPro was born!

04/20/23 11

European Bioinformatics Institute Wellcome Trust Genome Campus

Page 12: 2 nd  IMPACT workshop 5-6 May, 2010

How has it evolved?How has it evolved?

InterProInterPro

PfamPfamProfilesProfiles

ProDomProDom PRINTSPRINTS

PrositeProsite• Release 0.1 beta was made in October 1999

• It contained 2,423 entries– 1,370 PROSITE entries– 1,465 Pfam entries– 1,157 PRINTS entries– 241 preliminary profiles

• Based on Swiss-Prot 38 & TrEMBL 11

ProDomProDom

04/20/23 12

European Bioinformatics Institute Wellcome Trust Genome Campus

Page 13: 2 nd  IMPACT workshop 5-6 May, 2010

How has it evolved?How has it evolved?

“Various factors rendered a step-wise approach to the development of InterPro desirable. First, the scale of the task of amalgamating just the first 3 databases was immense. The rational merging of apparently equivalent database entries that in fact simultaneously define a specific family, domains within that family, or even repeats within those domains, presented an enormous challenge.”

04/20/23 13

European Bioinformatics Institute Wellcome Trust Genome Campus

Page 14: 2 nd  IMPACT workshop 5-6 May, 2010

super-family

domain family

sub-families

families

How has it evolved?How has it evolved?

• Unravelling the biological relationships is vital!

04/20/23 14

European Bioinformatics Institute Wellcome Trust Genome Campus

Page 15: 2 nd  IMPACT workshop 5-6 May, 2010

How has it evolved?How has it evolved?

• Clearly, the task of integration was hard– understanding the biological relationships being

represented within member databases, let alone between them, was proving to be a significant challenge

• Rather than making our lives easier, it was probably making them much harder!– …& that was just with 3 databases!

• Today, with 11 sources, life is harder still…

04/20/23 15

European Bioinformatics Institute Wellcome Trust Genome Campus

Page 16: 2 nd  IMPACT workshop 5-6 May, 2010

How has it evolved?How has it evolved?

• Release 0.1 beta was made in October 1999

• It contained 2,423 entries– 1,370 PROSITE entries– 1,465 Pfam entries– 1,157 PRINTS entries– 241 preliminary profiles

• Based on Swiss-Prot 38 & TrEMBL 11

• Release 26.0, March 2010• It contains 20,329 entries

– 1,023 Gene3D entries– 620 HAMAP entries– 2,234 Panther entries– 2,744 PIRSF entries– 1,975 PRINTS entries– 1,291 PROSITE regexs– 836 PROSITE profiles– 11,056 Pfam entries– 803 SMART entries– 1,095 SUPERFAMILY entries– 3,689 TIGRFams

04/20/23 16

European Bioinformatics Institute Wellcome Trust Genome Campus

Page 17: 2 nd  IMPACT workshop 5-6 May, 2010

Is InterPro still fit for purpose?Is InterPro still fit for purpose?

• The database has grown almost 10-fold in ~11 years• Why was it created in the first place?

– to simplify & rationalise protein family analysis• ensuring that entries & their linked signatures pointed to related

information on the same biological object

– to centralise & streamline the annotation process• & reduce manual annotation burdens

– to facilitate automatic functional annotation of uncharacterised proteins

– to make life easier!!

04/20/23 17

European Bioinformatics Institute Wellcome Trust Genome Campus

Page 18: 2 nd  IMPACT workshop 5-6 May, 2010

04/20/23 18

European Bioinformatics Institute Wellcome Trust Genome Campus

Is InterPro still fit for purpose?Is InterPro still fit for purpose?

Page 19: 2 nd  IMPACT workshop 5-6 May, 2010

04/20/23 19

European Bioinformatics Institute Wellcome Trust Genome Campus

04/20/23

European Bioinformatics Institute Wellcome Trust Genome Campus 19

Is InterPro still fit for purpose?Is InterPro still fit for purpose?

Page 20: 2 nd  IMPACT workshop 5-6 May, 2010

04/20/23 20

European Bioinformatics Institute Wellcome Trust Genome Campus

Is InterPro still fit for purpose?Is InterPro still fit for purpose?

Page 21: 2 nd  IMPACT workshop 5-6 May, 2010

04/20/23 21

European Bioinformatics Institute Wellcome Trust Genome Campus

Is InterPro still fit for purpose?Is InterPro still fit for purpose?

Page 22: 2 nd  IMPACT workshop 5-6 May, 2010

Is InterPro still fit for purpose?Is InterPro still fit for purpose?

Remember this?Remember this?

Why separate out structurally & functionally

relevant information?

Why separate out structurally & functionally

relevant information?

04/20/23 22

European Bioinformatics Institute Wellcome Trust Genome Campus

Page 23: 2 nd  IMPACT workshop 5-6 May, 2010

What is InterPro?What is InterPro?

A reminder:“InterPro is an integrated documentation resource for

protein families, domains & sites. InterPro combines a number of databases that use different

methodologies & a varying degree of biological information on well-characterised proteins to derive

protein signatures. By uniting the member databases, InterPro capitalises on their individual

strengths, producing a powerful integrated database & diagnostic tool.”

04/20/23 23

European Bioinformatics Institute Wellcome Trust Genome Campus

Page 24: 2 nd  IMPACT workshop 5-6 May, 2010

• Integration = greater than the sum of the parts- a perfect example…

Is InterPro still fit for purpose?Is InterPro still fit for purpose?

04/20/23 24

European Bioinformatics Institute Wellcome Trust Genome Campus

This integrated view is incredibly powerful &

informative!

This integrated view is incredibly powerful &

informative!

Page 25: 2 nd  IMPACT workshop 5-6 May, 2010

04/20/23 25

European Bioinformatics Institute Wellcome Trust Genome Campus

Is InterPro still fit for purpose?Is InterPro still fit for purpose?

Page 26: 2 nd  IMPACT workshop 5-6 May, 2010

04/20/23 26

European Bioinformatics Institute Wellcome Trust Genome Campus

Is InterPro still fit for purpose?Is InterPro still fit for purpose?

Page 27: 2 nd  IMPACT workshop 5-6 May, 2010

04/20/23 27

European Bioinformatics Institute Wellcome Trust Genome Campus

Is InterPro still fit for purpose?Is InterPro still fit for purpose?

What does it mean?What does it mean?

Page 28: 2 nd  IMPACT workshop 5-6 May, 2010

04/20/23 28

European Bioinformatics Institute Wellcome Trust Genome Campus

• Let’s see what the alignments actually look like- consider just the first 3 TM domains…

Is InterPro still fit for purpose?Is InterPro still fit for purpose?

They’re not the same!They’re not the same!

They’re still not the same!They’re still not the same!

Page 29: 2 nd  IMPACT workshop 5-6 May, 2010

Is InterPro still fit for purpose?Is InterPro still fit for purpose?

• In the process of growing bigger, InterPro has grown massively in complexity

• Its internal convolutions now challenge us to ask, “What does it mean?” – what does it all mean to end users?!– & what does it all mean to computers?!

04/20/23 29

European Bioinformatics Institute Wellcome Trust Genome Campus

Page 30: 2 nd  IMPACT workshop 5-6 May, 2010

Has it evolved in line with its vision?Has it evolved in line with its vision?

• With IMPACT, yes, InterPro has an opportunity to realise its original vision– it can rationalise protein family analysis– it can help to streamline the annotation process– it can facilitate functional annotation of proteins– it can make life easier– but it can only do these things if we’re prepared to

empathise, collectively, with its growing pains!

• That’s why this workshop is important

04/20/23 30

European Bioinformatics Institute Wellcome Trust Genome Campus

Page 31: 2 nd  IMPACT workshop 5-6 May, 2010

Is InterPro still fit for purpose?Is InterPro still fit for purpose?

That is still InterPro’s unique opportunity!

“There is a tremendous amount of information regarding evolutionary history and biochemical function implicit in each sequence and the number of known sequences is growing explosively. We feel it is important to collect this significant information, correlate it into a unified whole and interpret it.”

Margaret O. Dayhoff to C.Berkley, February 27th, 1967

“To kill an error is as good a service as, and sometimes even better than, the establishing of a new truth or fact.”

Charles Darwin, 1879This remains IMPACT’s imperative!

04/20/23 31

European Bioinformatics Institute Wellcome Trust Genome Campus

Page 32: 2 nd  IMPACT workshop 5-6 May, 2010

Day 1Day 1

A workshopA workshop5-6 May, 20105-6 May, 2010

09.00-09.30 Registration09.30-09.35 Domestic09.35-10.00 InterPro, an introduction (Terri)10.00-10.30 Single-motif signatures: pros, cons & added-value to InterPro (Nicolas)10.30-11.00 Multiple-motif signatures: pros, cons & added-value to InterPro (Alex)11.00-11.30 Coffee11.30-12.00 Domain-based signatures: pros, cons & added-value to InterPro (Rob)12.00-12.30 Structural annotation: pros, cons & added-value to InterPro (Corin)12.30-13.15 InterPro today [including GO mapping] (Sarah)13.15-14.00 Lunch14.00-14.30 How InterPro is used to add functional annotation to UniProt (Claire)14.30-15.30 Hands-on examples15.30-16.00 Coffee16.00-17.00 Open discussion/feedback19.30- Dinner

04/20/23 32

European Bioinformatics Institute Wellcome Trust Genome Campus

Page 33: 2 nd  IMPACT workshop 5-6 May, 2010

Day 2Day 2

A workshopA workshop5-6 May, 20105-6 May, 2010

09.30-10.00 Issues with integrating different signatures: domains10.00-10.30 Issues with integrating different signatures: families and subfamilies10.30-11.00 Meaningful terms to group signatures and name entries11.00-11.30 Coffee11:30-12:00 Unexpected sequences in match lists & how to reconcile them12.00-12.30 Improving InterPro’s interface to better visualise, integrate & maintain data12.30-13.00 Open discussions13.00-13.45 Lunch13.45-??? Format/outline/organisation of November outreach event

Future funding Reviewer feedback Review of EoY deliverables – status report & action plan AOB

04/20/23 33

European Bioinformatics Institute Wellcome Trust Genome Campus