52
Bioinformatics Computing Department Bioinformatics Government Post Graduate Collage , Mandain Abbottabad. Sajid Khan

Sajid Khan

  • Upload
    agnes

  • View
    48

  • Download
    2

Embed Size (px)

DESCRIPTION

Sajid Khan . Chapter 5. Data Visualization. Deoxy Human Hemonglobin . PDB entry 1A3N. Image produced with PDB Structure Explorer . Root Mean Squared Deviation (RMSD). - PowerPoint PPT Presentation

Citation preview

Page 1: Sajid Khan

Bioinformatics Computing

Department Bioinformatics

Government Post Graduate Collage ,

Mandain Abbottabad.

Sajid Khan

Page 2: Sajid Khan

Chapter 5. Data Visualization

Page 3: Sajid Khan

When you are inspired by some great

purpose, some extraordinary project, all

your

thoughts break their bonds; Your

mind transcends

limitations, your consciousness expands in

every

direction, and you find yourself in a new, great

and

wonderful world.

Page 4: Sajid Khan

Deoxy Human Hemonglobin. PDB entry 1A3N. Image produced with PDB Structure Explorer.

Page 5: Sajid Khan

Root Mean Squared Deviation (RMSD)

The degree of similarity is often expressed as a Root Mean Squared Deviation (RMSD), which represents the distance between the corresponding atoms in each molecule. Similar structures typically have an RMSD in the 1–3 Angstrom range, with larger RMSD values corresponding to greater deviations in similarity.

However, as the size of the protein increases, the minimum RMSD to qualify for what is considered a good fit increases. Whereas an RMSD of 10 Angstroms would be considered a poor fit for a small protein, it might be considered excellent for a longer protein with several hundred amino acids.

Page 6: Sajid Khan

The Challenge of Structure Comparison.

Each pair of protein backbones has the same RMSD value, but different relative amounts of structure similarity. Visualization, together with the RMSD value, provides the best indicator of structure similarity.

A—Uniformly DistributedDifference

B—Localized Difference

C—Significant Difference with FewAtoms

D—Small Difference with Many Atoms.

Page 7: Sajid Khan

Sequence Visualization

Sequence Maps

Map Viewer. NCBI's Map Viewer program integrates physical and genetic map information for specific sequences, proteins, and genes.

This view shows the position of the gene associated with type 2neurofibromatosis, located on chromosome 22.

Page 8: Sajid Khan

MAP VIEWER

Page 9: Sajid Khan

NEUROFEBROMATOSIS

Page 10: Sajid Khan

Map Viewer

Map Viewer provides a graphic depiction of nucleotide sequences through a composite of genetic, cytogenetic, physical, and radiation hybrid maps, each of which have their particular uses.

Genetic maps show the relative position and order of genes and other sequences on a chromosome, and serve as high-level approximations of relative distances between sequences.

Cytogenic maps provide a gross indication of the position of exons and entrons along a chromosome, based on optical microscope techniques. Physical maps show the actual physical location of sequences on a chromosome.

Radiation hybrid maps link genetic and physical maps.

Page 11: Sajid Khan

Gene Mapping Processes

A variety of techniques

are available for

creating physical and

genetic maps.

Sequence mapping involves first breaking up the chromosome at random into large fragments, which are then cloned with bacteria to make a bacterial artificial chromosome (BAC).

Page 12: Sajid Khan

Structure Visualization

One of the primary activities in proteomics R&D is determining and visualizing the 3D structure of proteins in order to find where drugs might modulate their activity.In contrast to visualizing the sequence of nucleotides on a strand of DNA, visualizing the primary structure of a protein adds little to the knowledge of protein function.

Barring the introduction of some new technology, cataloging, interpreting, and dissecting the proteome will take many years.

Unlike a nucleotide sequence, which is a relatively static structure,proteins are dynamic entities that change their shape and association with other molecules as a function of temperature, chemical interactions, pH, and other changes in the environment.

Page 13: Sajid Khan

Visualization Tools

Visualization Technologies. Visualization tools leverage thepattern-recognition capabilities of the viewer's visual apparatus as opposed to the logical, intellectual capabilities that can be more easily saturated.

Page 14: Sajid Khan

Rendering Tools

Most of the imaging work in bioinformatics involves data from the Protein Data Bank (PDB) or the Molecular Modeling Database (MMDB).

Glutamine Synthetase is 1FPY.

Page 15: Sajid Khan

Graphical representation.

• Using graphical representations of dataprovides added meaning and context.• It aids in understanding as those with littleknowledge of the subject maybe able tocomprehend the results.• It leaves less room for confusion.

Page 16: Sajid Khan

Graphical representation andbioinformatics

• The majority of the data is in abstract formthat needs visualization technologies toenhance user understanding.• This need is more pronounced in the areas ofsequence visualization, user interfacedevelopment, protein structure visualizationand as a complement to numerical analyses,especially statistical analysis.

Page 17: Sajid Khan

Graphical representation andbioinformatics

• In each application area, the rationale for usinggraphics instead of tables or strings of data is to shiftthe user’s mental processing from reading andmathematical, logical interpretation to faster patternrecognition.

• The perceptual clues in graphical displays can enhanceimmediate understanding of the data being presented.• While providing context and indication of relative

importance, relationships, that would otherwise be incomprehensible, are also brought forward.

1/29/2009

Page 18: Sajid Khan

Sequence Visualization

• Working with strings that represent nucleotidesequences is like programming in machinecode because, although possible, it is arduous,error-prone and time-consuming process thatdoesn’t lead to efficiency or easy maintenanceand one that requires extensive programdocumentation.

Page 19: Sajid Khan

Sequence Visualization• A step up from machine code is Assemblylanguage, which allows programmers to usemnemonics such as “CLR” to clear a buffer and“ADD” to add two values.• forced to think in terms of low-level CPUinstructions.• constantly switch between a high-level problemsuch as how to best rotate a molecule in 3-Dspace and a low-level problem such as whetherto use integer or float in the rotation algorithm.

Page 20: Sajid Khan

Sequence Visualization

• Further up the programming hierarchy arelanguages such as C++, Java, perl and HTMLthat insulate programmers from theunderlying computational hardwareinfrastructure and allow them to work at alevel nearer the application purpose.

Page 21: Sajid Khan

Sequence Visualization

• Higher still are the flow diagrams or storyboards -maps of sorts - that provide a graphic overview ofthe application that can be understood andcritiqued by non-programmers.• Returning to the nucleotide sequence work, theparallel to these storyboards are gene maps -high-level graphic representations of wherespecific sequences reside on a chromosome.

Page 22: Sajid Khan

Sequence Maps

• When it comes to visualizing nucleotidesequences, the obvious organizationmetaphors are the amino acids, proteins,chromosomes segments and genes.

• Gene maps provide a high-level view ofrelative and absolute gene and nucleotidesequence location.

Page 23: Sajid Khan

Map Viewer program integrates physical and genetic map information for specificsequences, proteins and genes. It is part of NCBI’s Entrez integrated system and providesa composite interface to several of NCBI’s online databases.

Map Viewer

Page 24: Sajid Khan

Map Viewer

• enables users to identify a particular genelocation with an organism’s genome, the distancebetween genes and the sequence data for a genein a particular chromosomal region.

• provides a graphic depiction of nucleotidesequences through a composite of genetic,cytogenetic, physical and radiation hybrid maps,each of which have their particular uses.

Page 25: Sajid Khan

Map Viewer• It illustrates how the main computational

challenge in visualizing linear nucleotidesequences lies in integrating data from multipledatabases.

• The sequences represented by the sequencemaps are one-dimensional so there is relativelylittle computational overhead involved.

• Sequences culled from NCBI’s sequentialdatabases are mapped onto the appropriategraphic and relevant links are provided to thecorresponding databases.

Page 26: Sajid Khan

Genetic Maps

• Show the relative position and order of genes andother sequences on a chromosome.• Serve as high-level approximations of relativedistances between sequences.• Measured in terms of recombination frequency.• Useful for a researcher who, for example, is

interested in the probability that the genes willseparate during meiosis.

Page 27: Sajid Khan

Physical Maps

• Show the actual physical location ofsequences on a chromosome.• Too detailed and difficult to work through.• Resolution of the map depends on themethodology used to create it.

• Simplest form is cytogenic mapping.

Page 28: Sajid Khan

Cytogenic Maps

• Provide a gross indication of the position ofexons and entrons along a chromosome.• Based on optical microscope techniques.• Most appropriate, for example, for a

researcher interested in quickly estimating therelative amount of DNA on a chromosomethat is involved in coding.

Page 29: Sajid Khan

Radiation Hybrid Maps• Most valuable mapping techniques link genetic andphysical maps.• Most common methods involve:

- radiation hybrid (RH) mapping• Can be used to reveal the distance between genetic markers by

exposing DNA measured doses of radiation, which causes the DNAto break up. By varying the amount of radiation, the averagedistance between DNA sequence breaks can be modified.

• Can be used to localize virtually any genetic marker.- simple sequence length polymorphisms (SSLPs)

• SSLPs are arrays of repeat sequences that display length variations.• SSLPs can serve as both a genetic marker and a basis for sequencemapping - a Rosetta Stone of sorts.

Page 30: Sajid Khan

Accuracy of Mapping

• Dependant on computational methods usedto manipulate the data acquired byexperimentation or modeling.

• The typical process involves an integration ofseveral mapping approaches.

Page 31: Sajid Khan

Cut Assign Genetic Markers Sequence

Gene

Physical Map Genetic Map

Frag Create BACs Frag BACs Sequence

Gene Mapping Processes. A variety of techniques are available for creatingphysical and genetic maps.

Gene Mapping Processes

Page 32: Sajid Khan

Sequence Mapping - the process• Break up the chromosome at random into largefragments• Clone these with bacteria to make a bacterial artificial

chromosome (BAC).• Order the BACs to maximize the contiguous regionwhile using the minimum number of BACs.• Break BACs to <500 nucleotides.• Sequence each fragment. This defines each contiguousregion.• The result is a physical map that may have a few gapsbetween contiguous regions.

Page 33: Sajid Khan

Structure Visualization• A nucleotide sequence is a relatively staticstructure.• Proteins are dynamic entities. They change theirshape and association with other molecules as afunction of:

- Temperature- Chemical interactions- pH- Changes in the environment

Page 34: Sajid Khan

• Hundreds of visualization tools available.• Many tools are hardware- or process-specific.

Visualization Tool Example

Nucleotide Location Map Viewer

Protein Structure SWISS-PDBViewer, WebMol, RasMol, Protein Explorer,Cn3D, VMD, MolMol, MidasPlus, Pymol, Chime, Chimera

User Interface Third-Party Browsers, VRML, Java Applets, C++

General-Purpose Microsoft Excel, Starta Vision 3D, Max3D, 3D-Studio, RaySoftware Dream Studio, StatView, SAS/Insight, Minitab, Matlab

General Purpose Stereo Goggles, Data gloves, 3D (Stereo) Displays, HapticHardware Devices

Visualization Tools

Page 35: Sajid Khan

Rendering Tools

• Most of the imaging work in bioinformaticsinvolves data from Protein Data Bank (PDB) orModeling Database (MMDB).

• Searching for a structure is typically throughprotein name or ID.

Page 36: Sajid Khan

1/29/2009

Rendering Tools

Page 37: Sajid Khan

Rendering Tools

• Representative protein structure renderingprograms available as free downloads fromthe internet include RasMol, PyMol, SWISS-PDBViewer and Chimera.

• Following is a summary of the features ofthese programs:

Page 38: Sajid Khan

Feature RasMol Cn3D PyMol SWISS-PDBViewer Chimera

Architecture Stand-alone Plug-in Web-enabled Web-enabled Web-enabled

Manipulation Power Low High High High High

Hardware Low/Moderate High High Moderate HighRequirement

Ease of Use High; command-line Moderate Moderate High Moderatelanguage command-line

language and GUI

Special Features Small size; very easy Powerful; GUI Powerful; GUI; ray- Powerful; GUI Powerful; GUI; built-to install and use; tracing option in extensions forestablished user collaborationbase; highly portable

Output Quality Moderate Very high High High Very high

Documentation Good Good Limited Good Very good

Support Online and user Online and user Online and user Online and user Online and usergroups groups groups groups groups

Speed High Moderate Moderate Moderate Moderate/Slow

OpenGL support Yes Yes Yes Yes Yes

Extensibility No No Yes; supports No Highly extensible;Python supports Python

Operating Systems Universal Universal Universal Universal Universal

Comparison

Page 39: Sajid Khan

Rendering Tools• The selection of a protein structure renderingprogram should be a function of:

- Ease of use- Power- Speed- Special features- Cost- Hardware requirements- Documents and support- Overall functionality

Page 40: Sajid Khan

Rendering Tools

• The more complex the rendering output, thegreater the computational load, and the moretime required to render each image.• Often, time and performance limitationsdictate the use of a simple, fast renderingpackage such as RasMol for day-to-dayrendering, and one of the higher-endpackages, such as Chimera, for publication-quality output.

Page 41: Sajid Khan

User Interface

• Hides the intricacies of the computerhardware and software.• Presents users with images, sounds andgraphics.

• Allows users to interact on a cognitive level.• Focuses the attention on what is beingpresented.

Page 42: Sajid Khan

User Interface

• Every computer application and everyworkstation has a user interface defined byhardware and software.

• A computer can run anything from a OS toweb-browser but the usability, usefulness andaccessibility of associated data is defined bythe user interface.

Page 43: Sajid Khan

User Interface

• The user interface determines the density ofthe information that can be presented to theuser.• This is defined by the Information Theorywhich suggests that user interface is themedium through which the data flow.

Page 44: Sajid Khan

Relevant Data User Interface

Information

Relevant & IrrelevantData

Source

Application

1/29/2009

Transmitter Medium

Interface NoiseHardware Source

Irrelevant Data

Receiver Destination

Eyes, Ears & UserProprioceptors Awareness

User Interface and Information Theory

Page 45: Sajid Khan

User Interface and Information Theory

• An application such as a 3D protein visualizationtool, is the information source.• The data created by the application is themessage.• The computer interface hardware, including thevideo card and monitor, is the transmitter.• The user interface, including buttons and othergraphics rendered on the computer monitor,serves as the medium.

Page 46: Sajid Khan

User Interface and Information Theory

• The irrelevant data includes components ofthe system that interfere with the messagegenerated by the application such as

- Superfluous graphics- Distracting colours- Other data that serves to confuse users

Page 47: Sajid Khan

User Interface and Information Theory

• The receiver is the user’s perceptualapparatus, including

- eyes for visual content- ears for audio content- proprioceptors for tactile or haptic content

• Finally, the message, now containing relevantand irrelevant data, reaches the ultimatedestination - the user’s awareness.

Page 48: Sajid Khan

User Interface• Being the medium, it is the major bandwidth-limiting element in the delivery of data fromthe application to the user.• Everything that affects the effectiveness of theuser interface affects delivery of data.• Users don’t need to know anything about thecomplicated underlying processes of theapplication.

Page 49: Sajid Khan

User Interface Components

• Designing an interface involves more thansimply deciding on the layout for buttons andcheck boxes on a display.

• Even the simplest user interface is- Complex- Multi-tiered- Supports communication

Page 50: Sajid Khan

User Interface Components

• The user interface minimally consists of aphysical interface between the user and thecomputer.

• It may also include- graphical- logical- emotional and- intelligent components.

Page 51: Sajid Khan

USER INTERFACE HISTORY

Page 52: Sajid Khan

Thank You

Contact : gpgcm_bc ( Yahoo Group )