View
220
Download
0
Embed Size (px)
Citation preview
8/13/2019 Identifying Cancer Patients using DNA Micro -Array Data in Data Mining Environment
1/13
Online available since 2014/Jan/26 at www.oricpub.com
(2014) Copyright ORIC Publications
Journal of Science and Engineering
Vol. 3 (2), 2013, 63-75
SEJournalScience and Engineering
ORICPublicationswww.oricpub.com
www.oricpub.com/journal-of-sci-and-eng
All rights reserved. No part of contents of this paper may be reproduced or transmitted in any form or by any
means without the written permission of ORIC Publications,www.oricpub.com.
Identifying Cancer Patients using DNA Micro-Array Data in Data
Mining EnvironmentZakaria Suliman Zubi
1,Marim Aboajela Emsaed
2
1Sirte University, Faculty of Science, Computer Science Department, Sirte, P.O Box 727, Libya.
2Tripoli University, Faculty of Information Technology , Computer Science Department, Tripoli, Libya, P,O Box 13210.
Abstract
The purpose of this work is to Identifying Cancer Patients using DNA Micro-Array Data
that use DNA chains which contain informational code to composition of the human body,
methods are based on the idea of selecting a gene subset to distinguish all classes, it will be
more effective to solve a multi-class problem, and we will propose a genetic programming
(GP) based approach to deal with the gene selection and classification tasks for biological
datasets. This biological dataset will be derived from multiple biological databases. The
procedure responsible for extracting datasets called DNA-Aggregator. We will design a
biological aggregator, which aggregates various datasets via DNA micro-array
community-developed ontology. Our aggregator is composed of modules that retrieve the
data from various biological databases. It will also enable queries by other applications to
recognize the genes. The genes will be categorized in groups based on a classification
method, which collects similar expression patterns. Using a clustering method such as
k-mean is required either to discover the groups of similar objects from the biologicaldatabase to characterize the underlying data distribution.
1. INTRODUCTIONData mining techniques used to make predictions and typically using
only recent static data. Sequence mining is a special case of structured data
mining and concerned with finding statistically relevant patterns between
data examples where the values delivered in a sequence. These values
delivered and then stored in huge collections of data; examples of such
collections include biological databases were the DNA sequence databases.However, these data is a sequential data in nature cases, which requires a
technique for discovering sequential patterns; this technique could be
sequence-mining technique. The principle of sequence mining is to discover
useful sequential knowledge. This knowledge obtains the form of insight
into the structure of the data. DNA (gene) is an extraordinary chip data with
thousands of attributes which represents the gene expression values [8].
Cancers caused through gene mutations and other types of chromosomal or
molecular abnormalities. The frequent sporadic cancers, i.e. cancers in
individuals with a negative family history for cancer, carry somatic gene
mutations acquired at mitosis. Genes caught up with cancers are mainly
those involved in normal homeostasis of cellular proliferation,
differentiation and death.
Received: 29 June 2013Accepted: 20 Dec 2013
Keywords:
Data Mining
Sequence Mining
Biological Database
Genetic Algorithm
Clustering
Classification
K-means
Correspondences:
Z. S. Zubi
Sirte University, Faculty of
Science, Computer Science
Department, Sirte, P.O Box
727, Libya.
8/13/2019 Identifying Cancer Patients using DNA Micro -Array Data in Data Mining Environment
2/13
64 | Z. S. Zubi, M. A. Emsaed
ORIC Publications/2014
Cancer growth usually requires some different gene mutations accumulate in a cell of origin and in its sub
clones during colonial evolution of malignant growth. Gene mutations in cancers invariably leads to
alterations of gene expression patterns with respect to normal cellular counterparts, including the mutated
genes themselves and their downstream targets [5].
New techniques may help us to overcome this limitation called Genetic programming (GP). Genetic
programming (GP) based is an essential method for both feature selection and generating simple models
based on a few genes demonstrated on cancer data. Genetic programming (GP) has been widely applied with
classification problems because it can discover underlying data relationships. GP is a promising solution for
the discovery of potentially important gene by generating comprehensible rules for classification.
1.1 Early Diagnosing of Cancer Diseases
A sound body depends on the continuous interplay of thousands of proteins, acting together in just the
right amounts and in just the right places--and each properly functioning protein is the product of an intact
gene.
Many, if not most of the diseases have their roots in our genes. More than 4,000 diseases stem from
altered genes inherited from one's mother and/or father. Common disorders such as heart disease and mostcancers arise from a complex interplay among multiple genes and between genes and factors in the
environment [4].
Cancer is a class of diseases distinguished by out-of-control cell growth. There are over 100 dissimilar
types of cancer, and the type of cell that is initially affected classifies each [10].
The Beginning of CancerAll cancers begin in cells, the body's basic unit of life. To recognize cancer, it's helpful to know what
happens when normal cells become cancer cells.
The body is made up of many types of cells. These cells grow and divide in a controlled way toproduce more cells as they are needed to keep the body healthy. When cells become old or damaged, they
die and are replaced with new cells.
However, occasionally this orderly process goes wrong. The genetic material (DNA) of a cell can
become damaged or changed, producing mutations that affect normal cell growth and division. When this
occurs, cells do not die when they should and new cells form when the body does not need them. The extra
cells may form a mass of tissue called a tumor as shown in figure1 [11].
Figure 1 The cancer transformation
Cancer ClassificationsFive broad groups used to classify cancer, these groups are listed as follow:
8/13/2019 Identifying Cancer Patients using DNA Micro -Array Data in Data Mining Environment
3/13
Identifying Cancer Patients using DNA Micro-Array Data in Data Mining Environment | 65
Journal of Science and Engineering /Vol. 3 (2), 2013
Carcinomas are characterized by cells that cover internal and external parts of the body such as lung,breast, and colon cancer.
Sarcomas distinguished by cells that are located in bone, cartilage, fat, connective tissue, muscle, andother supportive tissues.
Lymphomas are cancers that begin in the lymph nodes and immune system tissues.
Leukemias are cancers that begin in the bone marrow and often accumulate in the bloodstream. Adenomas are cancers that arise in the thyroid, the pituitary gland, the adrenal gland, and other
glandular tissues [10].
The objectives of the early detection are listed as follow:
a) To detect and remove / arrest all premalignant lesions;
b) To give patients the best treatment available;
c) To reduce the morbidity and mortality of this disease;
d) To help spread awareness among patients.
1.2 Sequence Mining TechniquesSequences are an important type of data which occur frequently in many fields such as medical,
business, financial, customer behavior, educations, security, and other applications. In these applications, the
analysis of the data needs to be carried out in different ways to satisfy different application requirements,
and it needs to be implemented in an efficient way as well.
DNA sequences encode the genetic makeup of humans and all other species; and protein sequences
describe the amino acid composition of proteins and encode the structure and function of proteins. Moreover,
sequences can be used to capture how individual humans behave through various temporal activity histories
such as weblogs histories and customer purchase ones. In general there are various methods to extract
information and patterns from databases, such as Time series, association rule mining and data mining [11].
2 BASIC DNA PRINCIPLESThe basic element of life is the cell, which is a tiny factory producing the raw materials, energy, and
waste removal capabilities necessary to sustain life. Thousands of different proteins, called enzymes, are
necessary to keep these cellular factories functioning. An average human being is composed of
approximately 100 trillion cells, all of which originated from a single cell. Each cell contains the same
genetic structure within the nucleus of our cells is a chemical substance known as DNA that contains the
informational code for replicating the cell and constructing the needed enzymes. Because the DNA resides
in the nucleus of the cell, it is often referred to it as a nuclear DNA [3].
DNA has two primary purposes: (1) to make copies of it so cells can divide and carry on the same
information; and (2) to carry instructions on how to make proteins so cells can build the machinery of life.
Information encoded within the DNA structure itself is passed on from generation to generation with
one-half of a person's DNA information coming from their mother and one-half coming from their father.
2.1 DNA Structure and definitionNucleic acids including DNA are composed of nucleotide units that are made up of three parts: a
nucleobase, a sugar, and a phosphate shown in figure 2. The nucleobase or 'base' imparts the variation in
each nucleotide unit while the phosphate and sugar portions form the backbone structure of the DNA
molecule. The DNA alphabet is composed of only four characters representing the four nucleobases: A
(adenine), T (thymine), C (cytosine), and G (guanine).
8/13/2019 Identifying Cancer Patients using DNA Micro -Array Data in Data Mining Environment
4/13
66 | Z. S. Zubi, M. A. Emsaed
ORIC Publications/2014
Figure 2. Basic components of nucleic acids: (a) phosphate sugar backbone with bases coming off the sugar molecules, (b)
chemical structure of phosphates and sugar molecules illustrating numbering scheme on the sugar carbon atoms. DNA sequencesare conventionally written from 5 to 3.
2.2 Base pairing and hybridization of DNA Strands
In its natural state in the cell, DNA is actually composed of two strands that are correlated together
through a process known as hybridization. Individual nucleotides pair up with their complementary base
through hydrogen bonds that form between the bases. The base pairing rules are such that adenine can only
hybridize to thymine and cytosine can only hybridize to guanine figure 3 illustrated more facts about the
pairing rules.
Figure 3. Base pairing of DNA strands to form doublehelix structure.
2.3 Chromosomes, genes, and DNA markers
There are approximately three billion base pairs in a single copy of the human genome. Obtaining a full
catalog of our genes was the focus of the Human Genome Project. The information from the Human
Genome Project will benefit medical science as well as forensic human identity testing and help us better
understand our genetic makeup.
Within human cells, DNA found in the nucleus of the cell (nuclear DNA) is divided into chromosomes,
which are dense packets of DNA and protection proteins called histones. The human genome consists of 22
matched pairs of autosomal chromosomes and two sex determining chromosomes figure 4 shows thesepairs. Thus, normal human cells contain 46 different chromosomes or 23 pairs of chromosomes. Males are
designated XY because they contain a single copy of the X chromosome and a single copy of the Y
chromosome while females.
8/13/2019 Identifying Cancer Patients using DNA Micro -Array Data in Data Mining Environment
5/13
Identifying Cancer Patients using DNA Micro-Array Data in Data Mining Environment | 67
Journal of Science and Engineering /Vol. 3 (2), 2013
Figure 4. Human genome contained in every cell consists of 23 pairs of chromosomes and a small circular genome known
as mitochondrial DNA.
Designating physical chromosome locationsThe basic regions of a chromosome are illustrated in figure 5. The centre region of a chromosome,
known as the centromere, controls the movement of the chromosome during cell division. On the other side
of the centromere are arms that terminate with telomeres as shown in figure 5. The shorter arm is referred
to as p while longer arm is designated q.
Figure 5. Basic chromosome structure and nomenclature
3 TUMOUR SUPPRESSOR GENE P53The p53 tumour suppressor gene is the most frequently altered gene in human cancer, including brain
tumours.
The p53 protein is a transcription factor involved in maintaining genomic integrity by controlling cell
cycle progression and cell survival. About 50% of primary human tumours carry mutations in the p53 gene.
The function of p53 is critical to the efficiency of many cancer treatment procedures, because radiotherapy
and chemotherapy act in part by triggering programmed cell death inresponse to DNA damage [6]. P53
tumour suppressor gene is one of the most commonly mutated genes. The p53 is a 20 Kb gene located on the
short arm of chromosome 17 at 17p13.1 locus.
8/13/2019 Identifying Cancer Patients using DNA Micro -Array Data in Data Mining Environment
6/13
68 | Z. S. Zubi, M. A. Emsaed
ORIC Publications/2014
3.1 Primers for PCR and DNA sequencingThe primers used were oligonucleotides complementary to the sequence flanking the exon/intron
junctions of exons 59. The sequence of the primers is as follows:
exon5, 5CTGACTTTCAACTCTG-3(forward) and 5-AGCCCTGTCGTCTCT-3 (reverse);
exon 6, 5- CTCTGATTCCTCACTG-3(forward) and 5-ACCCCA GTTGCAAACC-3 (reverse);
exon 7, 5-TGCTTGCCACAGGTCT-3(forward )and 5-ACAGCAGGCCAGTGT3(reverse);
exon8, 5AGGACCTGATTTCCTTAC-3 (forward) and 5-TCTGAGGCATAACTGC-3 (reverse);
exon9,5-TATGCCTCAGATTCACT-3(forward) and 5-ACTTGATAAGAGGTCC-3(reverse).
4 DNA MICRO-ARRAYS DATA CONCEPTSThe DNA micro-arrays produced by placing small drops of liquid include genes on a glass microscope
slide, and allowing the spots to dry. Each spot of liquid contains numerous copies of a single gene and the
characteristics of each spot's of gene are shown in figure 6.
Figure 6 Cartoon of a DNA micro-array
The mRNA is isolated from each population and each population of mRNA converted into colored cDNAusually in red and green. Once the two populations of cDNA's produced, they will be mixed and incubated
with the DNA micro-array and unbound cDNA is washed off, figure 7 shows the incubate process.
The DNA micro-array scanned to discover the two colours of cDNA and then the green and the red
images will be stored. Software merges the two colours and spots bound by both colours of cDNA appear
yellow .
Figure 7 Shows the method for producing labeled cDNA
We indicate some real data in figure 8 using an application program to analyze the data [1].
8/13/2019 Identifying Cancer Patients using DNA Micro -Array Data in Data Mining Environment
7/13
Identifying Cancer Patients using DNA Micro-Array Data in Data Mining Environment | 69
Journal of Science and Engineering /Vol. 3 (2), 2013
Figure.8 illustrates the real micro-array data for three genes.
The major application of microchips falls into three categories:
1- Gene expression profiling : while RNA is extracted from tumour samples and hybridised to the
micro-array to assess concurrently and in a single experiment the term of thousands of genes within the
sample.
2- Genotyping: Genomic DNA from an individual tested for hundreds or thousands of genetic
markers [notably single nucleotide polymorphisms (SNPs) or snips, or micro-satellite markers] in a single
hybridisation. This will yield a genetic fingerprint, which in turn may be linked to the risk of developing
single gene disorders or particular common complex diseases.
3- DNA sequencing: Sequence variations of specific genes can be monitored in a test DNA sample,
thereby greatly increasing the scope for precise molecular diagnosis in single gene disorders or complex
genetic diseases. [5].
DNA Sequencing Process:
1- MappingIdentity set of clones that span region of genome to sequence.
2- Library CreationMake sets of smaller clones from mapped clones.
3- Template PreparationPurify DNA from smaller clones
Set up and perform Sequencing chemistries
4- Gel ElectrophoresisDetermine sequences from smaller clones5- Pre-finishing and Finishing
Specialty techniques to produce high quality sequences
6- Data Editing/ AnnotationQuality assurance
Verification
Biological annotation
Submission to public database [12].Applications of DNA micro-arrays or chips in oncology Global understanding of abnormal gene expression contributing to malignancy, i.e. snapshots
of genes either up or down regulated in tumours.
Molecular classification of neoplasm's by gene expression signatures, forecasting the tissue
8/13/2019 Identifying Cancer Patients using DNA Micro -Array Data in Data Mining Environment
8/13
70 | Z. S. Zubi, M. A. Emsaed
ORIC Publications/2014
of origin of a tumour in the context of multiple cancer classes.
Classification of novel molecular-based subclasses in the tumours with clinical relevance. Discovery of new prognostic or predictive indicators and biomarkers of therapeutic response; Identification and validation of new molecular targets for drug development; Prediction of drug side effects during preclinical development and toxicology studies; Identification of genes conferring drug resistance; Prediction or selection of patients most likely to benefit from, or suffer from particular side
effects of drugs (pharmacogenomics) [5].
5 DNA BIOLOGICAL DATABASESStarting out with any research project it is required to gain information on the problem to be investigated.
Biological data can be organized in many different manners:
1. Flat text files databases;
2. Relational databases;
3. Object oriented databases.
Biological databases can be broadly classified in to sequence and structure databases. Sequence databases
are applicable to both nucleic acid sequences and protein sequences, whereas structure database is applicable
to only Proteins.
Biological database is the database of sequence. Three kinds of biological sequences include protein,
DNA and RNA. In recent years biological data is doubled in size every 15 or 16 months. Since there are so
many data in biology, biology database has greatly developed and became a part of the biologists everyday
toolbox. The number of everyday queries has also increased to 40,000 queries per day. So we should have
some good database search methods. Otherwise, we cannot use the biological database efficiently.
The Nature of the Data Collected from Patients and so to construct database, samples of DNA must be
collected, the samples analyzed, and the resulting data stored in such a way that it can be accessed efficiently.
In the systems now in use, blood, saliva, or other tissue or fluid is collected.
Databases and the ability to organize data are needed in order to keep research efficient and to get optimal
output and information from data obtained in the lab.
5.1 Biological DatasetBiological dataset is a data or measurements collected from biological sources, which is stored or
exchanged in a digital form. Biological dataset is regularly stored in files or databases. Examples of
biological data are DNA base-pair sequences, and population data used in ecology.There are a number of DNA datasets from published cancer gene expression, including leukemia cancer
dataset, colon cancer dataset, lymphoma dataset, breast cancer dataset, and ovarian cancer dataset. Among
them three datasets will be used in this proposal work.
Leukemia cancer datasetLeukemia dataset consists of 61 samples: 25 samples of Acute Myeloid Leukemia (AML) and 36 samples
of Acute Lymphoblastic Leukemia (ALL). The source of the gene expression measurements was taken form
55 bone marrow samples and 6 peripheral blood samples. The 34 of 61 samples are Leukemia cancer
samples and the remaining are normal samples.
Colon cancer datasetColon dataset consists of 68 samples of colon epithelial cells taken from colon-cancer patients. The 46
8/13/2019 Identifying Cancer Patients using DNA Micro -Array Data in Data Mining Environment
9/13
Identifying Cancer Patients using DNA Micro-Array Data in Data Mining Environment | 71
Journal of Science and Engineering /Vol. 3 (2), 2013
of 68 samples are colon cancer samples and the remaining are normal samples.
Lymphoma cancer datasetLymphoma dataset consists of 35 samples of Lymphoma cells taken from Lymphoma-cancer patients.
The 27 of 35 samples are Lymphoma cancer samples and the remaining are normal samples.
6 METHODS AND MODELS6.1 Genetic algorithm
Genetic Algorithms (GAs) are adaptive Guidance search algorithm provided on the evolutionary ideas
of natural selection and genetic. The basic concept of GAs is designed to simulate processes in natural
system necessary for evolution, specifically those that follow the principles first laid down by Charles
Darwin of survival of the fittest [7].
Three operators are used by genetic algorithms:
1. Selection:The selection operator Indicates to the method used for selecting which chromosomeswill be reproducing. The fitness function evaluates each of the chromosomes (candidate solutions), and the
fitter the chromosome, the more likely it will be selected to reproduce.
2. Crossover:The crossover operator performs recombination, creating two new offspring by randomly
selecting a locus and exchanging sub sequences to the left and right of that locus between two chromosomes
chosen during selection. For example, in binary representation, two strings 11111111 and 00000000 could
be crossed over at the sixth locus in each to generate the two new offspring 11111000 and 00000111.
3. Mutation: The mutation operator randomly changes the bits or digits at a particular locus in a
chromosome: usually, however, with very small probability. For example, after crossover, the 11111000
child string could be mutated at locus two to become 10111000. Mutation introduces new information to the
collect genetic and protects against pile too quickly to a local optimum.
Most genetic algorithms function Recursively updating a collection of possible solutions called a
population. Each member of the population is evaluated for fitness on each cycle. A new population then
replaces the old population using the operators above, with the fittest members being chosen for
reproduction or cloning.
The fitness function f (x) is a real-valued function operating on the chromosome (potential solution), not
the gene, so that the x in f (x) refers to the numeric value taken by the chromosome at the time of fitness
evaluation [2].
6.2 ClusteringClustering indicates to the grouping of records, observations, or cases into classes of similar objects. A
cluster is a collection of records that are similar to one another and dissimilar to records in other clusters.
Clustering differs from classification in that there is no target variable for clustering. The clustering task
does not try to classify, speculation, or expect the value of a target variable. Instead, clustering algorithms
requires segmenting the all data set into relatively homogeneous subgroups or clusters, where the similarity
of the records within the cluster is maximized, and the similarity to records outside this cluster is minimized.
k-means clusteringIn statistics and machine learning, k-means clustering is a method of cluster analysis which aims to
parting n observations into k clusters in which each observation belongs to the cluster with the nearest mean.
Algorithm:The algorithm of k-means clustering is a simple and effective algorithm for finding clusters in
8/13/2019 Identifying Cancer Patients using DNA Micro -Array Data in Data Mining Environment
10/13
72 | Z. S. Zubi, M. A. Emsaed
ORIC Publications/2014
data. The steps of algorithm proceeds as follows.
Step 1: Choose the number of clusters, k.Step 2: Randomly assign k records to be the initial cluster center locations.Step 3: For each record, find the nearest cluster center, in a sense, each cluster center "owns" a subset
of the records, which representing a partition of the data set. Thus consists k clusters, C1, C2, . . . , Ck.
Step 4: For each of the k clusters, find the cluster centroid, and update the location of each clustercenter to the new value of the centroid.
Step 5: Repeat steps 3 to 5 until convergence or termination.The "nearest" standard in step 3 is usually Euclidean distance. The cluster centroid in step 4 is found as
follows:
Suppose that there ndata points (a1, b1, c1), (a2, b2, c2), . . . , (an, bn, cn), the centroid of these points is
the center of gravity of these points and is located at point (ai/n ,bi/n,ci/n) (1). For example, the points
(1,1,1), (1,2,1), (1,3,1), and (2,1,1) would have centroid.
(1)
The algorithm terminates when the centroids no longer change. In other words, the algorithm terminates
when for all clusters C1, C2, . . . ,Ck, all the records "owned" by each cluster center remain in that cluster,
the algorithm may terminate when some convergence standard is met, such as no significant shrinkage in the
sum of squared errorsuse Equation (2):
(2)
The proposed system
This chart contains the phases throughout the system and the operations of the system respectively.
DNA sequence
Input
MATLAB
DNA-Aggregator( data set)
Genetic Algorithm
Cluster
Result
Performance
Analysis
Figure 9 the proposed system
8/13/2019 Identifying Cancer Patients using DNA Micro -Array Data in Data Mining Environment
11/13
Identifying Cancer Patients using DNA Micro-Array Data in Data Mining Environment | 73
Journal of Science and Engineering /Vol. 3 (2), 2013
7 IMPLEMENTATIONThe system will apply several methods such as Genetic Programming method in scene of initialization of
GP. We will also describe how the Data Clustering algorithms used in the system using MATLAB version
7.9.0.529 (R 2009b).
The results will be conducted in an excel file. This figure shows the results of starting the match
program which appears in the below Excel file.
Figure 10 result in Excel file
8
RESULTSThe reported results in our work were carried out in the proposed processes aiming to early and
accurate diagnosis for cancer patients.
- Leukem
Figure 11 result of Leukemia process
8/13/2019 Identifying Cancer Patients using DNA Micro -Array Data in Data Mining Environment
12/13
74 | Z. S. Zubi, M. A. Emsaed
ORIC Publications/2014
- Colon
Figure 12 result of colon process
- Lymph
Figure 13result of Lymph process
9 CONCLUSIONIn this paper, we proposed a genetic algorithm GA based approach to deal with the gene selection and
classification tasks for multi-class micro-array datasets. The multi-class problem was divided it into multiple
two-class problems, and a set of sub-ensemble systems deployed to deal with respective two-class problems.
The procedure responsible for extracting datasets called DNA-Aggregator. We designed a biological
aggregator, which aggregates various datasets via DNA micro-array community-developed ontology based
upon the concept of semantic Web for integrating and exchanging biological data. Trees constructed with
different genes; important genes selected as important references for clinic diagnosis or cancer development.
For each dataset, the biological significance of the selected genes validated from a biological database. The
GA based method presents useful alternatives in the analysis of complex multi-class micro-array datasets,
and working whit cluster (K-means) [9].
In our work we have applied GA in the sequencing of DNA molecules. The results produced by the
algorithm were very good and in many cases were optimal or close to optimal. Several challenges have been
faced and solutions found, so the system that has been designed is used for classifying, clustering and
detecting cancer in DNA chips data. The system involves two major modules, the first module the clustering
and the second module detects the cancer from the DNA chips.
REFERENCES[1] Malcolm Campbell and Laurie J. Heyer DNA Microarrays: Background, Interactive Databases, and Hands-on Data
Analysis .page 5 .
[2] DANIEL T. LAROSE. DATA MINING METHODS AND MODELS. Copyright 2006 by John Wiley & Sons,
8/13/2019 Identifying Cancer Patients using DNA Micro -Array Data in Data Mining Environment
13/13
Identifying Cancer Patients using DNA Micro-Array Data in Data Mining Environment | 75
Inc. All rights reserved. Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada.
Page 241.
[3] John M. Butler ,FORENSIC DNA TYPING, Copyright 2005, Elsevier (USA).[4] Lydia Schindler ,Donna Kerrigan, M.S, Jeanne Kelly , Brian Hollen . Understanding Cancer and Related Topics
Understanding Gene Testing.
[5] M. F. Fey The impact of chip technology on cancer medicine. DOI: 10.1093/annonc/mdf647.[6] PORNIMA PHATAK, S KALAI SELVI, T DIVYA, A S HEGDE, SRIDEVI HEGDE and KUMARAVEL
SOMASUNDARAM Alterations in tumour suppressor gene p53 in human gliomas from Indian patients. December 2002,
Indian Academy of Sciences
[7] Tan Jun-shan, He Wei1, Qing Yan , Application of Genetic Algorithm in Data Mining. 2009 First InternationalWorkshop on Education Technology and Computer Science. 978-0-7695-3557-9/09 2009 IEEE . DOI
10.1109/ETCS.2009.340. page 353.page 353.
[8] W. B. Langdon and B. F. Buxton Genetic Programming for Mining DNA Chip data from Cancer Patients ComputerScience, University College, Gower Street, London, WC1E 6BT, UK, fW.Langdon, [email protected]
http://www.cs.ucl.ac.uk/sta_/W.Langdon, /sta_/B.Buxton .page 1
[9] Zakaria Suliman Zubi ,Marim Aboajela Emsaed, 2010. "Sequence mining in DNA chips data for diagnosing cancerpatients". InProceedings of the 10th WSEAS international conference on Applied computer science(ACS'10), Hamido Fujita
and Jun Sasaki (Eds.). World Scientific and Engineering Academy and Society (WSEAS), Stevens Point, Wisconsin, USA,
139-151.
[10]http://www.medicalnewstoday .com/info/cancer-oncology / whatiscancer .php. Page headerWhat is Cancer?.Loginclock 01:37 pm. date: 06-05-2010
[11]http://www.cancer.gov/ cancertopics / what-is-cance Cancer?.Login clock 11:37 pm. date: 04-05-2010.[12] http://www.ornl.gov/sci/techresources/Human_Genome/graphics/DNASeq. Process.pdf .Page header: DNA
Sequencing Process Date. Login clock 11:03pm.Date 16-2-2010.
Please cite this article as: Z. S. Zubi, M. A. Emsaed, (2013), Identifying Cancer Patients using DNA Micro-Array Data in Data Mining Environment,Journal of Science and Engineering, Vol. 3(2), 63-75.
mailto:[email protected]://www.medicalnewstoday/http://www.medicalnewstoday/http://www.cancer.gov/%20cancertopics%20/%20what-is-cancehttp://www.cancer.gov/%20cancertopics%20/%20what-is-cancehttp://www.ornl.gov/sci/techresources/Human_Genome/graphics/DNASeq.http://www.ornl.gov/sci/techresources/Human_Genome/graphics/DNASeq.http://www.ornl.gov/sci/techresources/Human_Genome/graphics/DNASeq.http://www.cancer.gov/%20cancertopics%20/%20what-is-cancehttp://www.medicalnewstoday/mailto:[email protected]