28
Molecular Profiling Colloqium Janos Demeter December 15, 2006

Molecular Profiling Colloqium Janos Demeter December 15, 2006

Embed Size (px)

Citation preview

Page 1: Molecular Profiling Colloqium Janos Demeter December 15, 2006

Molecular Profiling Colloqium

Janos DemeterDecember 15, 2006

Page 2: Molecular Profiling Colloqium Janos Demeter December 15, 2006

HEEBO/MEEBO arrays in SMD

• Entering doping control data into SMD• Quality control graphs• Synthetic gene tool to compare data from

cDNA and HEEBO/MEEBO arrays• Merge pcl files tool• Current state of annotation

Page 3: Molecular Profiling Colloqium Janos Demeter December 15, 2006

• Source:http://alizadehlab.stanford.edu• Sequence_id: hSQnnnnnn

- In SMD: cloneid meaningless, but unique to a given oligo sequence

• Oligo_id: hXXnnnnnn unique to a well, not a sequence)- in SMD: oligo_id

the XX codes have meaning:

HEEBO/MEEBO arrays in SMD:Nomenclature

C: controlH: HumanT: transgenesV: viral/bact

A: alternative exon - antisenseD: doping controlE: EST-derived oligoN: negative controlT: tiling

Page 5: Molecular Profiling Colloqium Janos Demeter December 15, 2006

HEEBO/MEEBO arrays in SMD: Entering doping control data

• Heebo/meebo arrays contain a lot of various controls• To take advantage of the doping controls, it is essential to

know the amounts that were added to your samples• SFGF tells you how much is in 1 microliter of doping control

mix, but amplification/ dilution might change that• SMD needs to know how much you add in the sample

compared to how much SFGF tells you to add • Added problem: 4 tubes from SFGF: MJ and

Ambion_Stratagene, Cy3 and Cy5

Page 6: Molecular Profiling Colloqium Janos Demeter December 15, 2006

HEEBO/MEEBO arrays in SMD: Entering doping control data

• Experiment entry form can capture all this

• DCV2.1 = DCV2.1_Ambion_Stratagene + DCV2.1_MJ

• If no amplification, follow SFGF suggestion, enter:

DCV2.1, factor1=1, factor2=1

• If amplified/diluted controls, enter values for each tube:DCV2.1_MJ, factor1=1.5, factor2=1.6

DCV2.1_A_S, factor1=1.932, factor2=0.8

Page 7: Molecular Profiling Colloqium Janos Demeter December 15, 2006

Heebo/meebo arrays in SMD: Entering doping control data

• Experiment entry form can capture all this

• DCV2.1 = DCV2.1_Ambion_Stratagene + DCV2.1_MJ

• If no amplification, follow SFGF suggestion, enter:

DCV2.1, factor1=1, factor2=1

• If amplified/diluted controls, enter values for each tube:DCV2.1_MJ, factor1=1.5, factor2=1.6

DCV2.1_A_S, factor1=1.932, factor2=0.8

Page 8: Molecular Profiling Colloqium Janos Demeter December 15, 2006

HEEBO/MEEBO arrays in SMD: Quality control graphs

• HEEBO/MEEBO quality assessment graphs from BioConductor package (Agnes Paquet/UCSF)

• Per array graphs that use doping, tiling mismatch and negative controls

• For batch/uploaded gpr files: can be reached from main page

• For individual expts: from data display page

• For new expts with doping control: graphs are automatically created at data loading

• The last set of graphs are available from view expt page

New set of graphs Previously created set (or during data loading)

Page 9: Molecular Profiling Colloqium Janos Demeter December 15, 2006

HEEBO/MEEBO arrays in SMD:Quality control graphs

• Can be used for a gpr file uploaded from desktop - print has to be present in SMD and oligo_ids in the id/name column

• In batch for a result set list on loader.stanford.edu

• If called for a specific experiment, the values are already filled in.

• Normalization options available from limma. Note that this will NOT change your data in SMD, but is only used to generate the graphs

• Background subtraction methods - same story as normalization

• Job is placed in the job-queue - email is sent with link

Page 10: Molecular Profiling Colloqium Janos Demeter December 15, 2006

HEEBO/MEEBO arrays in SMD :Quality control graphs

• Can be used for a gpr file uploaded from desktop - print has to be present in SMD and oligo_ids in the id/name column

• In batch for a result set list on loader.stanford.edu

• If called for a specific experiment, the values are already filled in.

• Normalization options available from limma. Note that this will NOT change your data in SMD, but is only used to generate the graphs

• Background subtraction methods - same story as normalization

• Job is placed in the job-queue - email is sent with link

Page 11: Molecular Profiling Colloqium Janos Demeter December 15, 2006

HEEBO/MEEBO arrays in SMD:Quality control graphs

• Can be used for a gpr file uploaded from desktop - print has to be present in SMD and oligo_ids in the id/name column

• In batch for a result set list on loader.stanford.edu

• If called for a specific experiment, the values are already filled in.

• Normalization options available from limma. Note that this will NOT change your data in SMD, but is only used to generate the graphs

• Background subtraction methods - same story as normalization

• Job enqueued in the job-queue - email is sent with link

Page 13: Molecular Profiling Colloqium Janos Demeter December 15, 2006

• MA-plots before and after normalization A = 1/2*(log2(Cy5) + log2(Cy3))M = log2(Cy5 / Cy3)

• Loess lines are shown for sectors if print-tip normalization was selected

• Distribution should be centered around M=0, with no intensity dependence

HEEBO/MEEBO arrays in SMD:Diagnostic graphs

Page 14: Molecular Profiling Colloqium Janos Demeter December 15, 2006

• Tiling probes were designed along the transcript: 17 human genes (actin - 6 … LRP1 - 89 oligos

• Non-normalized signal intensities (Cy5 and Cy3) vs. probe’s distance from 3’-end

• Quick drop in signal indicates problem in sample (degradation/ivt)

HEEBO/MEEBO arrays in SMD:Tiling control graphs

Page 15: Molecular Profiling Colloqium Janos Demeter December 15, 2006

• Mismatch and tiling probes are used to test the degree of cross-hybridization among homologous probes

• Mutations are anchored (at the extremities) or distributed (along transcript)

• Calculated binding energies vs. normalized (i.e. divided by median of corresponding wild type probes) raw intensities

HEEBO/MEEBO arrays in SMD:Mismatch control graphs

Page 16: Molecular Profiling Colloqium Janos Demeter December 15, 2006

• Observed vs. expected log-ratios (normalized and bg corrected) for each doping control group

• Ratios should be aligned on the diagonal

• Graphs for individual doping controls as well

• Shows the range where the log(mass ratio) vs. log(intensity ratio) is linear

HEEBO/MEEBO arrays in SMD:Doping control graphs

Page 17: Molecular Profiling Colloqium Janos Demeter December 15, 2006

HEEBO/MEEBO arrays in SMD:Synthetic gene tool

• There is a help page for using synthetic gene tool:

http://smd.stanford.edu/help/synthGenes.shtml• A "synthetic gene" is a group of "reporters"

(clones, oligos, ORFs, etc.), together with some method of combining their expression vectors. Very useful tool, great flexibility in combining data rows.

• One use of it: compare data from various platforms, e.g. oligo to cDNA prints.

• Available from repository and applicable to a pcl file.

Page 18: Molecular Profiling Colloqium Janos Demeter December 15, 2006

HEEBO/MEEBO arrays in SMD:Synthetic gene tool

How to use it to compare heebo and cDNA arrays?:

• Select experiments from cDNA and heebo prints

• Selected biological annotation is not important for collapsing data

• What is important: include uid

• Save the pcl file in your repository

Page 19: Molecular Profiling Colloqium Janos Demeter December 15, 2006

HEEBO/MEEBO arrays in SMD:Synthetic gene tool

• Pcl file sorted by name column• synthetic gene tool only looks at the first column

Page 20: Molecular Profiling Colloqium Janos Demeter December 15, 2006

HEEBO/MEEBO arrays in SMD:Synthetic gene tool

• To access the tool, click the “synth” icon in the repository

• Rows can be collapsed based on a number of prepared lists - now LocusLink should be selected

• The default option will remove the original ids and annotations and replace the rows with the average

Page 21: Molecular Profiling Colloqium Janos Demeter December 15, 2006

HEEBO/MEEBO arrays in SMD:Synthetic gene tool

The default option averages the rows and removes the original annotations

Page 22: Molecular Profiling Colloqium Janos Demeter December 15, 2006

HEEBO/MEEBO arrays in SMD:Synthetic gene tool

Collapse of rows by any arbitrary grouping of genes

Prepared lists are available for - chromosomal locations- cytobands- locusid- clusterid- transcript length groups- cancer modules (E. Segal)- tissue types- processes- any other genelist in user’s

genelist directory on loader• Name of genelist will become

the name of synthetic gene.• Individual reporters can be

weighted ( -1 to 1 )

Page 23: Molecular Profiling Colloqium Janos Demeter December 15, 2006

HEEBO/MEEBO arrays in SMD:Synthetic gene tool

• Average rows (reporters) by synthetic gene and:- don’t remove original data rows- remove averaged data rows (but keep the ones that don’t belong to any

synth gene)- remove all original data rows

• Don’t average, only annotate the rows with synthetic gene annotation (prepend name column):

- keep/don’t keep original annotation

Page 24: Molecular Profiling Colloqium Janos Demeter December 15, 2006

HEEBO/MEEBO arrays in SMD:Merge PCL files

• Combine two (or more) pcl files into single pcl file- files can be on the desktop or in repository

• In the process: - average (optionally) columns (experiments) with the same name- average (optionally) rows (genes) based on a translation file

• Averaging can be mean or median

Page 25: Molecular Profiling Colloqium Janos Demeter December 15, 2006

HEEBO/MEEBO arrays in SMD:Merge PCL files

Pcl1

Pcl2

Translation file

Combined PCL

Page 26: Molecular Profiling Colloqium Janos Demeter December 15, 2006

HEEBO/MEEBO arrays in SMD:State of annotation

• Meebo: anotation complete and is in SMD• Heebo: anotation complete, but some oligo

annotations are not in SMD yet.Annotations: geneid (locusid)

gene name

gene symbol

chromosome location (in gff file)

GB accession (RefSeq/est)

Problem: ~500 oligos are annotated to more than one gene (~1000 spots involved) - these cases can’t be correctly represented in the database currently. The fields that have conflict are not entered into SMD.

Page 27: Molecular Profiling Colloqium Janos Demeter December 15, 2006

• For each sequence (sequence_id) we can have only one set of annotations.

• We have developed a new biosequence schema for SMD, to model the relationships between sequences, genes and genomes in a more biologically meaningful manner. Among other things, the new schema will allow us to map one sequence to more than one gene.

• We are currently migrating existing sequence annotations to tables using the new biosequence schema. Once this is finished (soon), all the biological annotations for the HEEBO arrays will be available in SMD.

HEEBO/MEEBO arrays in SMD:State of annotation

Page 28: Molecular Profiling Colloqium Janos Demeter December 15, 2006

• Updates• Genome coordinates: When a new genome version is

released, the oligos need to be BLASTed anew (last time: spring of 2005, meebo: 2004) to find the coordinates of oligos. New releases have been made 1-3 times a year. Result: oligos to chromosomal locations.

• Biological annotations: Annotations need to be updated to capture new knowledge. Result: chromosomal coordinates to genes.

• Currently, no updates are done for the sequences on the HEEBO/MEEBO arrays. They will be worked out after we have the new biosequence tables in place.

HEEBO/MEEBO arrays in SMD:State of annotation