View
416
Download
1
Category
Tags:
Preview:
DESCRIPTION
Bioinformatics is crucial to all life science research. The European Bioinformatics Institute (EBI) is one of a few major centres in the world that provide data and services for bioinformatics and, with Australia’s membership of EMBL, a natural collaborator of Australia. In 2010 a project was launched to mirror EBI services from the University of Queensland (UQ). The goal was to improve Australian bioinformatics by removing barriers of geographical remoteness. We have revisited the Mirror’s mission in light of experience and with input from a survey of Australian bioinformatics needs, and are creating the Bioinformatics Resource Australia – EMBL (BRAEMBL) with a mission to: enable optimal exploitation of the tools and data of bioinformatics by Australian scientists contribute to the global biomolecular information infrastructure in a way which showcases Australian science. engage in Australia-wide training in support of these goals Key findings of the survey and the rationale for the BRAEMBL project will be presented. BRAEMBL will work with the EBI to create a part of the EBI in Australia and to ensure that Australian scientists have access to the data and methods of bioinformatics and the necessary IT resources, though integrated high-quality services to rival those available anywhere in the world. This will draw on the support of Australian partners including BioPlatforms Australia (BPA) and the existing eResearch infrastructure. It will work with UQ’s Research Computing Centre to be early adopters of modern IT methodologies, in particular cloud computing. The evolving plan for the BRAEMBL and its contribution to Australian bioinformatics will be presented.
Citation preview
EMBLAustralia
Bioinformatics Resource Australia EMBL
Bioinformatics Services in Australia – a
collaboration with the European
Bioinformatics Institute
BRAEMBL
Bioinformatics Resource Australia – EMBL
Bioinformatics
Focus – central dogma
Molecular information and its phenotypic correlates
Genomes–Genes–Transcripts–Proteins–Structures–Interactions–Pathways–Systems
EMBLAustralia
Bioinformatics Infrastructure
• Shared data and tools of bioinformatics
• Global databases and systems to explore and exploit them
• E.g.
– GenBank, PDB, UniProt, Ensembl etc.
EMBLAustralia
You can’t do biology without exploiting this information infrastructure
EMBLAustralia
Global Information Ecosystem
• data collection
• data curation
• service
• EBI (European Bioinformatics Institute)
• NCBI
• SIB (Swiss Institute of Bioinformatics)
• etc.
EMBLAustraliaThe EBI:
European Bioinformatics Institute
• Part of EMBL – The European Molecular Biology Laboratory
• About 500 staff and $80 million p.a.
• Australia is an Associate Member of EMBL
• Special relationship with the EBI
EMBLAustralia
EMBL, EBI, Mirror
• Australian science needs bioinformatics
• Perceived disadvantage in Australia
– Geography
– Size
– Infrastructure
• Exploit the EBI ?
EMBLAustralia
EBI Mirror Project
• Copy EBI data and software
• Offer services directly to Australia
• Funding from various government schemes
EMBLAustralia
Beyond Mirror
• Did mirror some EBI services
• Across-the-board mirroring impossible
• Alternatives?
• Re-examine the mission
This talk is about what I am trying to achieve
EMBLAustralia
Back to basics - Mission
• Optimal exploitation of shared tools and data of bioinformatics
• Show case Australian science in global databases
• Training in support of these goals
EMBLAustralia
Back to basics - Mission
• Optimal exploitation of shared tools and data of bioinformatics
• Show cases Australian science in global databases
• Training in support of these goals
EMBLAustralia
Surveying the community
• February 2013
• Solicited input from 500 – 1000 individuals
• 210 responses
• 50% Wet
• 50% Dry
EMBLAustralia
Demography
New South Wales, 47
Victoria, 63
Queensland, 54
Western Australia, 12
South Australia, 18
Tasmania, 3ACT, 7
Northern Territory, 0
New Zealand, 1
Figure 1. Geographical source of responses Figure 2. Sector of respondents
Academic institute/university,
165
CSIRO, 16
Gov. State, 5Gov. Commonwealth, 3
Large commercial, 5
SME, 4
Health, 1
EMBLAustralia
Bioinformatics ubiquitous
Dry
Wet0
5
10
15
20
25
30
35
40
45
50
Full-timebioinformatician Use
bioinformaticsas a core tool
Usebioinformatics
toolsoccasionally
Rarely/neveruse
bioinformaticstools, but would
like to
Figure 4. Use of Bioinformatics
EMBLAustralia
Normal methods of bioinformatics
0 50 100
Biochemistry
Bioinformatics…
Cell biology
Developmental biology
Ecology
Evolutionary biology
Genetics
Genomics
Livestock biology
Marine biology
Metabolomics
Microbiology
Molecular biology
Neurobiology
Pathology
Plant biology…
Pharmacology
Physiology
Proteomics
Systems biology
Taxonomy
Transcriptomics
Main Plus
Figure 5. Scientific domains
0% 50% 100%
Images
3D Structures
Small molecules
Molecular interactions
Pathways
Proteomics
Protein Motifs
Protein Sequences
Gene expression
Genomes
Genes
Nucleic Acids
Very useful Somewhat useful Not useful
Figure 6. Usefulness, percentage
of respondents
Disadvantaged access ?
0 20 40 60
Australian …
New South Wales
New Zealand
Queensland
South Australia
Tasmania
Victoria
Western Australia
0 20 40 60
Australian …
New South Wales
New Zealand
Queensland
South Australia
Tasmania
Victoria
Western Australia
Data
IT resources Expertise 0 20 40 60
Australian …
New South Wales
New Zealand
Queensland
South Australia
Tasmania
Victoria
Western Australia
Somewhat
Not at all
A lot
-30 -10 10 30 50
In group
In organisation
From collaborators
External
None, want some
Don't need
-30 -10 10 30 50
In group
In organisation
From collaborators
External
None, want some
Don't need
-30 -10 10 30 50
In group
In organisation
From collaborators
External
None, want some
Don't need
-30 -10 10 30 50
In group
In organisation
From collaborators
External
None, want some
Don't need
High performance compute
Databases
Software Bioinformatics support staff
Inadequate Adequate or
Training
0 100 200
Database Searching
Sequence Alignment
Sequence clustering/phylogeny
NGS analysis
Statistical analysis
Network and pathway analysis
Structure analysis
Very useful Somewhat useful Not at all useful
Figure 9. Usefulness of training
• Three quarters of respondents indicated “very useful” for at least one topic
• Only four indicated no interest in any training
• Demand – Programming
– Statistics
0 20 40 60
Compute Infrastructure
Data Quality
Data Quantity
Network
Data Complexity
Data Access
Compute
Software
Storage
Community
Funding
My Speciality
Training
Expertise
0 20 40 60 80
Funding
Create or improve software
Access to data
Compute power
My Speciality
Community building
Be a hub for bioinformatics
Access to Expertise
Offer training
Figure 10. Areas of greatest difficulty Figure 11. Areas where BRAEMBL
could make greatest contribution
EMBLAustralia
Survey Conclusions
• Bioinformatics is important • “central dogma” • Wet and dry • Geographic disadvantage not crippling • Scientists like it in their own group • Lack of (access to) expertise • Training and community building • Programming and statistics
EMBLAustralia
Back to basics - Mission
• Optimal exploitation of shared tools and data of bioinformatics
• Show cases Australian science in global databases
• Training in support of these goals
EMBLAustralia
Spectrum of service – tides of change
Style of usage Historic/traditional Today
Search-and-browse In the distant past done
locally
All done on remote
information centres
Molecular searching Commonly local 15 years
ago
Through web forms
submitted to data centres
Programmatic access All local up to about 6
years ago
Extensive use of
programmatic access to
remote machines (REST)
Methods development Still almost all done
locally
Emerging possibility of
virtual machines at
remote data centres.
EMBLAustralia
Spectrum of service – tides of change
Style of usage Historic/traditional Today
Search-and-browse In the distant past done
locally
All done on remote
information centres
Molecular searching Commonly local 15 years
ago
Through web forms
submitted to data centres
Programmatic access All local up to about 6
years ago
Extensive use of
programmatic access to
remote machines (REST)
Methods development Still almost all done
locally
Emerging possibility of
virtual machines at
remote data centres.
EMBLAustralia
-30 -10 10 30 50
In group
In organisation
From collaborators
External
None, want some
Don't need
-30 -10 10 30 50
In group
In organisation
From collaborators
External
None, want some
Don't need
-30 -10 10 30 50
In group
In organisation
From collaborators
External
None, want some
Don't need
-30 -10 10 30 50
In group
In organisation
From collaborators
External
None, want some
Don't need
High performance compute
Databases
Software Bioinformatics support staff
Inadequate Adequate or
EMBLAustralia
Ensure access to:
• Data
• Software methods
• Hardware
• Expertise
Research needs bioinformatics
Bioinformatics expertise
Software methods
Shared databases
Computers and stuff
Bioinformatics
Outsourcing
We need more
Too much
Made possible by SOA’s Virtualisation Cloud computing
Increased Outsourcing
Users find outsourcing hard
BRAEMBL’s job it to make it easy
EMBLAustralia
The IT forecast – generally cloudy
• Move the method to the data not the data to the method
• Why own computers?
• Why own storage?
• Buy naked compute from a vendor (e.g., Amazon)
• Make data visible to the cloud
EMBLAustralia
Back to basics - Mission
• Optimal exploitation of shared tools and data of bioinformatics
• Show case Australian science in global databases
• Training in support of these goals
EMBLAustralia
Back to basics - Mission
• Optimal exploitation of shared tools and data of bioinformatics
• Show case Australian science in global databases
• Training in support of these goals
EMBLAustralia
Projects of iconic Australian interest
• Barrier reef species
• Koala
• (Sheep)
EMBLAustralia
Sea-quence project • Sea-quence project unites
Great Barrier Reef and Red Sea scientists
• Supported by Rio Tinto, Bioplatforms Australia (BPA) and ReFuGe 2020
• Convened by the Great Barrier Reef Foundation
• Sequence – 10 corals
– algal symbionts
– bacteria and viruses
EMBLAustralia
Collaboration with the EBI
• Get the data into the best possible database • At the best possible quality • Quickly • Identifiably Australian Ensembl - Ensembl Genomes - ENA Do the things the EBI won’t prioritise Mini-team in place at BRAEMBL
EMBLAustralia
Back to basics - Mission
• Optimal exploitation of shared tools and data of bioinformatics
• Show case Australian science in global databases
• Training in support of these goals
EMBLAustralia
Back to basics - Mission
• Optimal exploitation of shared tools and data of bioinformatics
• Show case Australian science in global databases
• Training in support of these goals
EMBLAustralia
Expertise building
• Short courses on bioinformatics services in collaboration with the EBI, BPA, CSIRO and others
• Australian Bioinformatics Network to build Community
• User support
EMBLAustralia
Currently on a crusade to persuade Australia to turn BRAEMBL into a sustainable infrastructure
• As part of EMBL-Australia
• With an annual budget of $3 to $5 million
• With substantial security (~5 years) for about 40% of that budget
• With a truly infrastructural mindset
• $5 million is at best 2% of the global budget for such centres
EMBLAustralia
Infrastructure mindset
• Academic institutions value – Publications – High quality graduates – Grants won
• This is different – The mission is to serve researchers throughout Australia – It only makes sense as a long-term project – It must support careers of engineers – It needs sustained and talented leadership
EMBLAustralia
Service Mission
• BRAEMBL will flourish best in a research context
• It will produce some publications
• Its staff won’t seem so different from researchers
• Don’t take comfort in those similarities as the reason to do this
• The project will only do well if its unique service mission is embraced enthusiastically
A part of the EBI in Australia
Recommended