20
Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

Embed Size (px)

Citation preview

Page 1: Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

Xinbin Dai, Ph. D.

Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0

RC1)

Page 2: Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

• About Affymetrix Medicago GeneChip

• Mapping Algorithm and Tool

• Bioinformatics Resources for Medicago Truncatula

Agenda

Page 3: Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

Affymetrix GeneChip Probes

5’ UTR EXON-I EXON-II EXON-III 3’ UTR

mRNA

Probeset: 11 Probes

Target Transcript

25-mer

1 255 10 15 20

1 255 10 15 20

Perfect match - PM

Mismatch - MM

Page 4: Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

• id_at:Designates probe sets that uniquely recognize target transcripts

• id_a_at:Designates probe sets that recognize alternative transcripts from the

same gene.• id_s_at:

Designates probe sets with common probes among multiple transcripts from different genes.

• id_x_at: Designates probe sets where it was not possible to select either a

unique probe set or a probe set with identical probes among multiple transcripts. Rules for cross-hybridization were dropped in order to design the _x probe sets. These probe sets share some probes identically with two or more sequences and, therefore, these probe sets may cross-hybridize in an unpredictable manner.

GeneChip® Expression Analysis Data Analysis Fundamentals.

Probeset Types

Page 5: Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

About Medicago GeneChip

Type Num of probe sets

Percent in the Mtr. set

Notes

Unique probe sets: e.g. Mtr.10097.1.S1_at

44182 86.80 Unique to one gene

Alternative (_a_), e.g.: Mtr.10267.1.S1_a_at

116 2.28 Alternative probe sets to one gene

Shared (_s_), e.g. Mtr.10146.1.S1_s_at

4793 9.42 Common to multiple genes

Others (_x_), e.g.:Mtr.10093.1.S1_x_at

1809 3.55 Other probe sets with complicated mapping

Total 50900 100

Reference sequences: early version of IMGAG, DFCI GeneIndex and alfalfa EST

Page 6: Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

• Gene transcripts were matched to corresponding Affymetrix probe sets using a position-weighted scoring index in which mismatches near the middle of a probe were most heavily penalized as follows:

A perfect match for a probe set yields a score of 45

• Matches were declared when at least 8 of 11 probes had scores of 43 or higher.

Cutoff for matching: 43x8=344

Mapping Algorithm and Tool

1 255 10 15 20

[1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,2,2,2,2,2,1,1,1,1,1]

Originated from Affymetrix, Inc.

Page 7: Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

AffyProbeMapping: An Online Affymetrix Probeset Mapping Tool

http://bioinfo3.noble.org/affymap/

Input sequence:

• Transcript

• cDNA

• EST/Unigene

• CDS

Page 8: Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

Output of AffyProbeMapping:

AffyProbeMapping also supports Affymetrix chips for other species:

Lotus Japonica, Arabidopsis thaliana, rice, soybean, maize, populus, cotton and tomato

Page 9: Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

Bioinformatics & Data Resources for Medicago Truncatula

Originated from Affymetrix, Inc.

Data Sources:• Mt3.5v4(2011, version for Nature paper):

optical mapping 44,124 BAC-based gene loci + 18,264 illumina (nr) gene model

• Mt3.5v5(2012, minor changes): 45,859 BAC-based gene loci + 18,264 illumina gene model

• Mt4 RC1(2013, PAG 2013 conference): anchored illumina contigs onto pseudochromosomes. 84,993 gene loci (BAC+illumina). Chr sequences frozen; some of gene models might be removed.

• DFCI Gene index Release 11 294k ESTs/ETs 68,814 Unigenes

Page 10: Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

Statistics on Mt3.5v4 vs. Probesets Mapping Results using AffyProbeMapping

Num of cDNA Matching probe_set Percent

37,385 0 59.92

18,354 1 29.42

6,649 >=2 10.66

62,388 Total 100

Page 11: Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

Statistics on Mt4RC1 vs. Probesets Mapping Results using AffyProbeMapping

Num of cDNA Matching probe_set Percent

58,660 0 69.02

20,257 1 23.83

6,076 >=2 7.15

84,993 Total 100

Page 12: Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

Statistics on GeneIndex R11 vs. Probesets Mapping Results using AffyProbeMapping

Num of cDNA Matching probe_set Percent

29,722 0 43.2

32,848 1 47.7

6,244 >=2 9.1

68,814 Total 100

Page 13: Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

Mapping between Medicago genome vs. AffyMedicago Chip

http://bioinfo3.noble.org/affymap/Dataset.gy

Page 14: Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

Bioinformatics Tools For Medicago

• Sequence Search and Annotation– DOBLAST --- http://bioinfo3.noble.org/doblast/ , a parallel computing

accelerated BLAST search tool

Features:o Preload many Medicago

data resourceo Capable of handling

big dataseto “Tab-delimited bioparser

output format” works friendly with Excel

Page 15: Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

Bioinformatics Tools For Medicago

• Sequence Download and Cut by Coordinates.

– “Sequence Download” page of DOBLAST --- batch download sequences or cut sequences by Coordinates

o Preload many Medicago data resources

o Batch download

o Get a fragment of sequence by coordinates

Page 16: Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

DOBLAST sequence download page

Page 17: Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

Bioinformatics Tools For Medicago

• LegumeIP: An Integrative Platform to Study Gene Function and Genome Evolution in Legumes.

• Features:– Synteny analysis among model legumes– Phylogenetic analysis for gene family– Gene to gene association analysis– Gbrowser

o http://plantgrn.noble.org/LegumeIP/o We are updating to Version 2

Page 18: Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

LegumeIP: Synteny analysis for Medicago genome

Page 19: Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

LegumeIP: Phylogenetic analysis for Medicago gene family

Page 20: Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

LegumeIP: Gene association network analysis for Medicago gene