Course Work Project

Project title

“Data Analysis Methods for Microarray Based Gene Expression Analysis”

Sushil Kumar Singh (batch 2002-03)IBAB, Bangalore

Done at Siri Technologies Pvt. Ltd.

Bangalore

Outline

Introduction Overview of Data Analysis Normalization Clustering Algorithms Future work Acknowledgements Questions ???

Introduction

Overview of Data Analysis

Normalization An attempt to remove systematic variation

from data. Sources of systematic variation –

Biological source Influenced by genetic or environmental factors, Age,

sex etc. Technical source

Induced during extraction, labelling, and hybridization of samples

Printing tip problems Measurement source

Different DNA conc. Scanner problem

Why Normalize Data

To recognize the biological information in data.

To compare data from one array to another.

In practice we do not understand the data – inevitably some biology will be removed too.

Normalization methods

Methods of elements selections Housekeeping genes All elements Using Spiked control

Methods to calculate normalization factor Log ratio Lowess Ratio statistics

Clustering

For a sample of size “n” described by a d-dimensional feature space, clustering is a procedure that

Divides the d-dimensional features in K-disjoint groups in such a way that the data points within each group are more similar to each other than to any other data point in other group.

Clustering algorithms

Unsupervised – without a priory biological information Agglomerative – Hierarchical Divisive – K-means, SOM

Supervised – a priory biological knowledge Support vector machine (SVM)

Hierarchical clustering (HC)

Agglomerative technique steps

The pair-wise distance is calculated between all genes. The two genes with shortest distance are grouped

together to form a cluster. Then two closest cluster are merged together, to form

a new cluster. The distances are calculated between this new cluster

and all other clusters Steps 2 to 4 are repeated until all the objects are in

one cluster.

HC contd.

Data table

HC contd.

• Calculation of distance matrix using data table.Experiment » AxisLog ratio of genes » Coordinates

• For n-experiments n dimensional space

HC contd.

Distance between genes Euclidean distance

Pearson correlation

Semi-metric distance – Vector angle

Metric distance – Manhattan or City block

HC contd. Distance between clusters

Single linkage clustering

Complete linkage clustering

Average linkage clustering UPGMA Weighted pair-group average Within-groups clustering Ward’s method

HC contd.

The result of HC displayed as branching tree diagram called “Dendrogram”.

Pros and cons of HC Easy to implement, quick visualization of

data set. Ignores negative associations between

genes, falls in category of greedy algorithms.

K-means Clustering

Divisive approach Steps

Specify K-initial clusters and find their centroid.

For each data point the distance to each centroid is calculated.

Each data point is assigned to its nearest centroid.

Centroids are shifted to the center of data points assigned to it.

Steps 2-4 is iterated until centroid are not shifted anymore.

K-means clustering contd.

Pros and Cons No dendrogram It is a powerful method if one has prior idea

about the no. of cluster, so it works well with PCA.

Future Work

It includes similar analysis on Self Organizing Map (SOM) Support Vector Machine (SVM) Relevance Network Gene Shaving Self Organizing Tree Analysis (SOTA) Cluster Affinity Search Technique (CAST)

Acknowledgements

Institute of Bioinformatics and Applied Biotechnology (IBAB), Bangalore

Dr. Ashwini K Heerekar (Siri Technologies Pvt. Ltd, Bangalore)

Dr. Jonnlagada Srinivas (Siri Technologies Pvt. Ltd, Bangalore)

Mr. Kiran Kumar (Siri Technologies Pvt. Ltd, Bangalore)

Mr. Mahantha Swamy MV. (Siri Technologies Pvt. Ltd, Bangalore)

Selected references: A Biologist Guide to Analysis of DNA

Microarray DATA, by Steen Knudsen DNA Microarrays And Gene Expression from

experiment to data analysis and modeling, by P. Baldi and G. Wesely

Papers: Computational Analysis of Microarray Data by John Quackenbush,

Nature Genetics Review, June 2001, vol2. The use and analysis of Microarray Data by Atul Butte, Nature

Review drug discovery, Dec 2002, vol1. Microarray Data Normaliation and Transformation by John

Quackenbush, Nature Genetics.

Questions

Thank You

Course Work Project

Documents

Course Work New

Revised Guidelines for Project Work: Selection of Topic ... · 1 Revised Guidelines for Project Work: Selection of Topic, Report Writing and Evaluation (w.e.f.2014-15 batch) The course

Course Outcomes of BBA FIRST YEAR - Ideal Institute of ... · Course Outcomes of BBA FIRST YEAR CO101: ... (Minor Project Report) With this Project work the student will have the

Marketing Course Work

Course Work Guidelines

4901 Course Work

Course Work 6

Course Work. Economics

Course Work project for IGCSE ICT computer studies 2014

· Core course VII 3 3 Core course VIII 3 3 Open Course I 2 4 Core course Practical V 5 * 3 hrs 25 % 75% 16 Core course Practical VI 5 * Course work/Project work/Industrial visit

PET 606 COURSE PROJECT WORK

FINAL PROJECT -RM-Course Work B -Abdulla Izzat Abdulqadr Salama Abushabieb-14420 Reviewed 1 (1)

AC 29/1/2020 I -II to III-IV syllabus 2020.pdf · 4 Project Work - I 06 4 Project Work - II 06 Total Credits 24 Total Credits 24 Note: Project work is considered as a special course

Firewall Course Work

Scanned by CamScannerSapkal Vishal Patane Sejal Sorate Avishkar Nitin B Nikhare Project work Project work Project work Project work Project work Project work ... SY B.Tech. INTERNSHIP

A Project-based Learning Course - Amazon Web Services · PDF fileA Project-based Learning Course TED Talk: ... Math for America ... upload their work. This supports project-based learning

Work Skills Course

Systems perspectives on biomass resources, 7.5 Credits...project work including oral presentations (approximately 176 hours). Examination To pass the course, a project work must be

Architecture Course work

Geography Course Work