30
ant Genomics in Education Works Genome Exploration in Your Classroom

IPlant Genomics in Education Workshop Genome Exploration in Your Classroom

Embed Size (px)

Citation preview

iPlant Genomics in Education Workshop

Genome Exploration in Your Classroom

• Big Data: data sets whose size and complexity is beyond the capabilities of commonly used tools to capture, manage, and process the data within a tolerable time frame.

• Big Data: constantly moving target currently ranging from a few dozen terabytes to many petabytes of data in single data sets, with different types of data sets potentially deeply intertwined.

- Wikipedia (http://en.wikipedia.org/wiki/Big_data)

Challenges: the scope and scale of life sciences data continue to growWorking with Big Data

Coming into the Genome Age

For the first time in the history of science students can work with the same data and tools that are used by researchers.

Learning by posing and answering question.

Students generate new knowledge.

The iPlant CollaborativeVision

How can we prepare for science we can’t anticipate?

The iPlant CollaborativeVision

Enable life science researchers and educators to use and extend iPlant's foundational cyberinfrastructure to understand and ultimately predict the complexity of biological systems and their dynamic nature under various environmental conditions.

The iPlant CollaborativeWhat is Cyberinfrastructure?

Cyberinfrastructure (CI) is data storage, software, high-performance computing, and people – organized into systems that solve problems of size and scope that would not otherwise be solvable.

The iPlant CollaborativeWhat is Cyberinfrastructure?

Platforms, tools, datasets Storage and compute Training and support

The iPlant CollaborativeWhat problems can iPlant Solve?

Crops and model plant systems Animal and livestock Agronomic microbes, insects…

The iPlant CollaborativeWhat problems can iPlant Solve?

iPlant is built for Data

The iPlant CollaborativeHow was iPlant built?

“I had the feeling I have been exposed to many bioinformatics tools but I would be unable to use any of them on my own.”

The limitations of any training workshop

3. Keep asking questions

• If iPlant can, we’ll help show you how…• If iPlant can’t we’ll find the path that gets you what you need

Don’t hesitate to ask “Can iPlant do this?”

Keep asking at ask.iplantcollabortive.org

Bringing Genomics into the Classroom

Visualization of the Pectobacterium atrosepticum genomehttp://www.scri.ac.uk/research/pp/plantpathogengenomics/pathogenbioinformatics

Bringing Genomics into the Classroom“Essentially, all models are wrong, but some are useful” – George E.P. Box

From This…

• 1866 – Mendel publishes work on inheritance• 1869 – DNA discovered• 1915 – Hunt Morgan describes linkage and recombination• 1953 – Structure of DNA described• 1956 – Human chromosome number determined• 1968 – First gene mapped to autosome• 1977 – Dideoxy sequencing• 1983 – PCR• 1986 – Human Genome Project proposed

Bringing Genomics into the Classroom

• 1993 – First MicroRNAs described• 2003 – First ‘Gold Standard’ human genome sequence• 2005 – First draft of human haplotype map (HapMap)• 2007 – ENCODE project

Timeline: Welcome Trust (http://www.wellcome.ac.uk/stellent/groups/corporatesite/@policy_communications/documents/web_document/wtx063807.pdf)

Bringing Genomics into the Classroom

1973Sharp, Sambrook, Sugden

Gel Electrophoresis Chamber, $250

1958 Matt Meselson &

Ultracentrifuge, $500,000

The Egalitarian GeneAgarose Gel Electrophoresis, 1973

The Egalitarian GenomeNext Generation Sequencing, 2005

Bacterial colonies PCR colonies (clusters, features)

Hundreds of millions of…

To This…

Bringing Genomics into the Classroom

Research Education

For the first time in the history of biology students can work with the same data at the same time and

with the same tools as research scientists.

Educational Challenge

Context of scientific discovery

Walk or…

…ridean educational Discovery Environment

iPlant Genomics in Education Workshop

Major Workshop Concepts:

•Biology is becoming a “Data Unlimited” science.•Genomes are dynamic.•Genomes are more than just protein coding genes.•DNA sequence is information.•Gene annotation adds “meaning” to DNA sequence.•Biological concepts like “genes” and “species” continually evolving.•DNA barcoding bridges molecular genetics, evolution, ecology.

The Problem of Big Data in Biology The abundance of biological data generated by high-throughput sequencing creates challenges, as well as opportunities:

•How do scientists share their data and make it publically available?

•How do scientists extract maximum value from the datasets they generate?

•How can students and educators (who will need to come to grips with data-intensive biology) be brought into the fold?

Majority of genome is transcribed~50% transposons~25% protein coding genes/1.3% exons~23,700 protein coding genes~160,000 transcriptsAverage Gene ~ 36,000 bp

7 exons @ ~ 300 bp6 introns @ ~5,700 bp7 alternatively spliced products

(95% of genes)RefSeq: ~34,600 “reference sequence” genes (includes pseudogenes, known RNA genes)

Bringing Genomics into the Classroom

Using Plants to Explore Genomics

Using Plants to Explore Genomics

There are a large number ofplant genomes available for analysis.

Using Plants to Explore GenomicsThe “weirdness” of plant genomes

on your dinner plate

Triticum aestivum: allohexaploid

Brachypodium

Sorghum

Oryza

Brachypodium

1 2 3 4 5

1

2 3 4 5

10 3 9 7 8 4 2 5 6

1

3 6 1 5 7 2 8 10 11 12 9 4

50-70

46

28

25

13

14

9

150-300

Monocots

Dicots

Time (million years)Present204060

Oryza (rice)

Avena (oats)

Hordeum (barley)

Triticum (wheat)

Setaria (foxtail millet)

Pennisetum (pearl millet)

Sorghum

Zea (maize)

Arabidopsis

Brachypodium

Glycine max (soy)

2,500 Mb

750 Mb

20,000 Mb

270 Mb

430 Mb

145 Mb

1,115 Mb

?? Mb

5,200 Mb

>20,000 Mb

?? Mb

- Genome duplication event

Using Plants to Explore Genomics

Using DNA Subway to Explore Genomics