20
0 1 2 3 4 1 2 3 4 5 C hips signalvalue Expression Profile = the pattern of signal values for one gene over several chips. Expression Profile Clustering = the clustering of “similar” profiles Why? Similar expression profiles suggest regulation (by shared factor or cluster member) related function ALSO: opposite expression profiles suggest regulation (e.g. inhibition) Software: – EpClust – J-Express – Etc…. Expression Profile Clustering

Expression Profile Clustering

Embed Size (px)

DESCRIPTION

Expression Profile Clustering. Expression Profile = the pattern of signal values for one gene over several chips. Expression Profile Clustering = the clustering of “similar” profiles Why? Similar expression profiles suggest regulation (by shared factor or cluster member) - PowerPoint PPT Presentation

Citation preview

  • Expression Profile ClusteringExpression Profile = the pattern of signal values for one gene over several chips. Expression Profile Clustering = the clustering of similar profiles

    Why? Similar expression profiles suggest regulation (by shared factor or cluster member)related functionALSO: opposite expression profiles suggestregulation (e.g. inhibition)

    Software:EpClustJ-ExpressEtc.

    Chart3

    1

    2

    1

    3

    0

    Gene1

    Chips

    signal value

    Sheet1

    12130

    21203

    Sheet1

    00

    00

    00

    00

    00

    experiment

    signal

    Sheet2

    Sheet3

    MBD00042D1E.xls

    Chart1

    14

    25

    14

    36

    03

    experiments

    signal value

    Sheet1

    12130

    45463

    Sheet1

    00

    00

    00

    00

    00

    experiments

    signal value

    Sheet2

    Sheet3

  • Guided Tour of EpClust

  • Data file formatCan enter in a variety of formats.See EpClusts data upload page

    One simple way:Download tab-delimited data from NASCOpen with (or paste into) excelDelete all but one name column and Signal value columns. Do not include any hyphens! (I try to avoid all punctuation)Save as tab delimited text

    Annotation can be added as either:A second file with the same first column of namesA single second column in the signal file (if specified)

  • To Upload Data

  • To select uploaded data

  • Select Specific Experiments within Input File

  • To Remove Unreliable Data

  • To Choose Algorithm TypeHierarchicalMeasures distance between each profile (i.e. gene) with each other profileSo time increases exponentially with each geneThen clusters closest genes together, followed by increasingly distant ones Into a tree of clusters within clusterscan be visualised and finally split at chosen distance with knowledge of treeK-meansInitially choose a set number (K) of clusters1) Chooses the K most different profiles2) Clusters each remaining profile with one of the KSo, much less time than Hierarchical (if many genes)

    Cluster size can vary so can return too many or few geneswith no way to select.Then must repeat with different K values.

  • So Which is Better?K-Means is better able to manage large data-sets.

    Hierarchical seams a more objective aproachIn that you dont need to decide cluster number at start

    But which is more biologically informative?

    Opinion divided.Both artificial.Dont prove, just suggest.K-means gives more consistent results.

    .and then theres SOTA too!

  • Alternative Distance Measurements

  • Pearson-Based Distance MeasurementThe most commonly usedHow similar the SHAPES of the two profiles areBased on average of values and the standard deviationRates from identical (1) to completely uncorrelated (0) to perfect opposites (-1)

    Centred Test?If profiles have identical shape, but offset from each other by a fixed value (or magnitude)Centred: Identical (1)Uncentred: Not (

  • Euclidean-Based Distance Measurementbetween gene expression levels directly based on magnitude of changes More about signal VALUES. Less about profile shapes. data must be suitably normalizede.g. use log-ratios of signals

    Euclidean: shortest path between points Manhattan: the sum of distances along each dimension

    Chart3

    00000000

    00000000

    00000000

    00000000

    00000000

    00000000

    00000000

    00000000

    chip1 signal

    chip2 signal

    chip3 signal

    Sheet1

    00000000

    00000000

    00000000

    00000000

    00000000

    00000000

    13200000000

    22300000000

    754

    465

    Sheet1

    0

    0

    0

    0

    chip1

    chip2

    Sheet2

    00000000

    00000000

    00000000

    00000000

    00000000

    00000000

    00000000

    00000000

    chip1

    chip2

    chip3

    Sheet3

    Chart1

    3

    2

    5

    6

    chip1 signal

    chip2 signal

    Sheet1

    13

    22

    75

    46

    Sheet1

    0

    0

    0

    0

    chip1

    chip2

    Sheet2

    Sheet3

  • To Choose Algorithm Type

  • RESULTS

  • RESULTS_MyGene

  • _MyGene

  • Extra Options

  • Extra OptionsPhylip FormatSearch for Promoter MotifsFormat Results PictureExport Lists as TextStatisticsExport clusters as:text (lists of gene names) tree cut at chosen height.Or formatted for Phylip (tree drawing software)

    Visualise promoter region next to each profile

    Highlight motifs You must tell it the sequences of motifs though.

    Must prepare file of upstream sequences Currently must arrange with EpClust staff to upload Can get from querying a database such as PlantCare using the upstream sequence of our particular gene of most interestCan get from TAIR using a list of gene names

  • PlantCARECis-Acting Regulatory Elementshttp://intra.psb.ugent.be:8080/PlantCARE/

    Database of CAREs

    ToolsSearch for CAREEnter upstream sequence of a geneLists and then highlights known motifs from database

    Motif SamplerEnter upstream sequences of your clusterHighlights 8mers conserved more in cluster members than in genome

    Other toolsClusteringQuery for info on motifs

  • Gene OntologyA hierarchical structure to describe gene function.

    As PlantCARE compares:Expression Profile Clusters to Promoter Motif Conservation

    There are also tools to compare:Expression Profile Clusters to Gene Ontology