View
223
Download
1
Tags:
Embed Size (px)
Citation preview
Data retrieval
BioMart
Data sets on ftp site
MySQL queries of databases
Perl API access to databases
Export View
• All genes from a candidate region
• Genes with a particular protein domain
• Members of a protein family
• Genes associated with SNPs
Possible queries…Possible queries…
• Human genes with upstream regions conserved w.r.t. mouse
• Upstream sequence for all Ensembl genes mapped to U95A chip (similarly, complete genomic annotation of MG_U74).
• Genomic location and description of all mouse, rat and fugu homologues of all human genes, with transmembrane domains, expressed in cardiovascular system and have non-synonymous SNPs.
More specific queriesMore specific queries
• Normalised
• Each data point stored only once
• Quick updates
• Minimal storage requirements
• But:• Many tables
• Many joins for complicated queries
• Slow for data mining questions
Ensembl core databaseEnsembl core database
BioMart and EnsMartBioMart and EnsMart
• Large-scale data retrieval tool• Query builder interface• Databases: Ensembl, SNP, Vega, (MSD, UniProt)• Associated features or sequences• Flexible output formats• http://www.ebi.ac.uk/biomart/• http://www.ensembl.org/EnsMart/
• De-normalised
• Tables with ‘redundant’ information
• Query-optimised
• Fast and flexible
• designed for data mining
Mart databaseMart database
Primary Data SetsPrimary Data Sets
• Ensembl genes• SNP
– Single nucleotide polymorphisms– Deletion-insertion polymorphisms– Short tandem repeats
• Vega genes• (MSD protein structures)• (UniProt proteomes)
Secondary Data SetsSecondary Data Sets
• Markers
• Diseases
• Gene ontology
• Gene expression information
• Homology predictions
• Protein annotation
SPECIES
FOCUS
REGION
SNP
PROTEIN
HOMOLOGY
GENE
EXPRESSION
REFSEQ
INTERPRO
GO
SWISSPROT
EMBL
AFFY
REGION
SNP
PROTEIN
HOMOLOGY
GENE
EXPRESSION
FASTA
FILE
EXCEL
TEXT
GTF
HTML
start filter output
Information flowInformation flow
• Direct database access at ensembldb.ensembl.org• martdb.ebi.ac.uk • MySQL client
Download MySQL for Windowshttp://www.winmysql.com/page4.htmlFile: wmysr11.zip
What about queries not What about queries not possible to do in EnsMartpossible to do in EnsMart
• Based on bioperl
• Ensembl modules
• For an introduction, see the tutorial at:
• http://www.ensembl.org/info/software/core/
Access via Perl object APIAccess via Perl object API
There are other ways…There are other ways…MartShellCommandline interface to Mart written in Java.
It works with a Mart Query Language