16
Mining microarray expression data by literature profiling Damien Chaussabel and Alan Sher National Institutes of Health

Mining microarray expression data by literature profiling

  • Upload
    kalin

  • View
    37

  • Download
    2

Embed Size (px)

DESCRIPTION

Mining microarray expression data by literature profiling. Damien Chaussabel and Alan Sher National Institutes of Health. Why do we need automated literature profiling?. Very large datasets and complex experiments Inability of qualified individuals to manually mine available literature - PowerPoint PPT Presentation

Citation preview

Page 1: Mining microarray expression data by literature profiling

Mining microarray expression data by literature profiling

Damien Chaussabel and Alan Sher

National Institutes of Health

Page 2: Mining microarray expression data by literature profiling

Why do we need automated literature profiling?

Very large datasets and complex experiments

Inability of qualified individuals to manually mine available literature

Diverging naming schema have evolved

Page 3: Mining microarray expression data by literature profiling

Literature Profiling

“We describe how a literature-derived term frequency database can be generated and mined through the analysis of patterns of occurrences of a restricted subset of relevant terms”

Page 4: Mining microarray expression data by literature profiling

Step 1: Literature Indexing

Articles related to the genes in the list are searched for in the Medline database

For each gene, the search results were downloaded and the abstracts extracted and saved as a new file for text analysis

Page 5: Mining microarray expression data by literature profiling

Step 2: Text Analysis

Determine the word occurrence for each unique word by analysis of the Medline entries (this study had 4,000 of them)Gives three kinds of terms: those found a whole lot (“cell”, “because”, “is”, “the”), those found rarely, and those found frequently but only for a few genes. The third is the useful one.

Page 6: Mining microarray expression data by literature profiling

Step 3: Filter, Filter, Filter

1. Remove commonly found terms

2. Set cutoff for distance from baseline

3. Eliminate terms that only apply to 1 gene

4. Increasing threshold/Increase Specificity

5. Decrease threshold/Increase Sensitivity

Page 7: Mining microarray expression data by literature profiling

Step 4: Clustering Analysis

Tools originally designed for microarray clustering can be applied to literature mining to create “literature profiles”Array of term-occurrence values vs. individual genes is createdRelationships are assessed by hierarchical clustering (done using Cluster/Treeview at Eisen lab website *free*)

Page 8: Mining microarray expression data by literature profiling

ResultsIn this study, the groups identified were related to immune response (as was hoped)Genes for transcription factors that control inflammation and apoptosis fell into the first clusterChemokines were all grouped into the second groupMHC-I antigen-presenting pathway genes occupied the third grouping

Page 9: Mining microarray expression data by literature profiling

Assumption

“The basis for analyzing expression patterns is the assumption that genes under common transcriptional control are involved in similar processes.”

Page 10: Mining microarray expression data by literature profiling

Notes about parameter setting

For genes with a larger number of abstracts, a 25% cutoff may be too high, but for genes with only five abstracts, it may be too low.Optimizing cutoffs:

cut-off = t + (k/n) where t is the minimum threshold, k is a

constant and n is the number of abstracts retrieved for a given gene (t and k arbitrary)

Page 11: Mining microarray expression data by literature profiling

Benefits

Independent of user bias and can be used to identify promising findings in unbiased way

Provides investigators with leads for further in-depth investigation of the literature

Page 12: Mining microarray expression data by literature profiling

Limitations

Hindered by need to retrieve the relevant literature reliably for each gene included in the analysis (editing often required by hand)

Can only be used to direct further investigation (has false +s and –s)

Page 13: Mining microarray expression data by literature profiling

Why this paper’s method is significant…

Few groups have tried to overcome the inability of scientists to manually mine all the literature in a high-throughput fashion

This technique differs from others because it is based on term occurrence rather than gene name co-citation frequencies

Page 14: Mining microarray expression data by literature profiling

Some text-mining software:

Omniviz (www.omniviz.com)

Eisen Labs (http://rana.lbl.gov/index.htm)

Page 15: Mining microarray expression data by literature profiling

Possible use:

Could identify functions of unknown genes using ‘guilt by association’

Page 16: Mining microarray expression data by literature profiling

Where to find more information:

Profiles generated from this paper can be downloaded and explored using the clustergram browser Treeview available online at no charge (rana.lbl.gov/index.htm).