49
Biosciences, 2009, 1, 1-47 Copyright © 2009 SciRes. Biosciences TABLE OF CONTENTS Volume 1 Number 1 November 2009 MicroPath-A pathway-based pipeline for the comparison of multiple gene expression profiles to identify common biological signatures M. Khan, C. B. Gorle, P. Wang, X. H. Liu, S. L. Li………………………………………………………………1 Assessment of bone condition by acoustic emission technique: A review S. Shrivastava, R. Prakash………………………………………………………………………………………12 Descriptively probabilistic relationship between mutated primary structure of von Hippel-Lindau protein and its clinical outcome S. M. Yan, G. Wu…………………………………………………………………………………………………23 New blind estimation method of evoked potentials based on minimum dispersion criterion and fractional lower order statistics D. F. Zha…………………………………………………………………………………………………………33 Visualization of protein structure relationships using constrained twin kernel embedding Y. Guo, J. B. Gao, P. W. H. Kwan1, K. X. S. Hou………………………………………………………………40

TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

Biosciences, 2009, 1, 1-47

Copyright © 2009 SciRes. Biosciences

TABLE OF CONTENTS

Volume 1 Number 1 November 2009 MicroPath-A pathway-based pipeline for the comparison of multiple gene expression

profiles to identify common biological signatures

M. Khan, C. B. Gorle, P. Wang, X. H. Liu, S. L. Li………………………………………………………………1

Assessment of bone condition by acoustic emission technique: A review

S. Shrivastava, R. Prakash………………………………………………………………………………………12

Descriptively probabilistic relationship between mutated primary structure of von

Hippel-Lindau protein and its clinical outcome

S. M. Yan, G. Wu…………………………………………………………………………………………………23

New blind estimation method of evoked potentials based on minimum dispersion

criterion and fractional lower order statistics

D. F. Zha…………………………………………………………………………………………………………33

Visualization of protein structure relationships using constrained twin kernel embedding

Y. Guo, J. B. Gao, P. W. H. Kwan1, K. X. S. Hou………………………………………………………………40

Page 2: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

Biosciences

Journal Information

SUBSCRIPTIONS

The Biosciences (Online at Scientific Research Publishing, www.SciRP.org) is published quarterly by Scientific

Research Publishing, Inc.,USA.

E-mail: [email protected]

Subscription rates: Volume 2 2009 Print: $50 per copy.

Electronic: free, available on www.SciRP.org.

To subscribe, please contact Journals Subscriptions Department, E-mail: [email protected]

Sample copies: If you are interested in subscribing, you may obtain a free sample copy by contacting Scientific

Research Publishing, Inc at the above address.

SERVICES

Advertisements

Advertisement Sales Department, E-mail: [email protected]

Reprints (minimum quantity 100 copies)

Reprints Co-ordinator, Scientific Research Publishing, Inc., USA.

E-mail: [email protected]

COPYRIGHT

Copyright© 2009 Scientific Research Publishing, Inc.

All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in

any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as

described below, without the permission in writing of the Publisher.

Copying of articles is not permitted except for personal and internal use, to the extent permitted by national

copyright law, or under the terms of a license issued by the national Reproduction Rights Organization.

Requests for permission for other kinds of copying, such as copying for general distribution, for advertising or

promotional purposes, for creating new collective works or for resale, and other enquiries should be addressed to

the Publisher.

Statements and opinions expressed in the articles and communications are those of the individual contributors and

not the statements and opinion of Scientific Research Publishing, Inc. We assumes no responsibility or liability for

any damage or injury to persons or property arising out of the use of any materials, instructions, methods or ideas

contained herein. We expressly disclaim any implied warranties of merchantability or fitness for a particular

purpose. If expert assistance is required, the services of a competent professional person should be sought.

PRODUCTION INFORMATION

For manuscripts that have been accepted for publication, please contact:

E-mail: [email protected]

Page 3: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

MicroPath-A pathway-based pipeline for the comparison of multiple gene expression profiles to identify common biological signatures Mohsin Khan , Chandrasekhar Babu Gorle , Ping Wang , Xiao-Hui Liu , Su-Ling Li

ABSTRACT

High throughput gene expression analysis is swiftly becoming the focal point for deciphering molecular mechanisms underlying various dif-ferent biological questions. Testament to this is the fact that vast volumes of expression profiles are being generated rapidly by scientists worldwide and subsequently stored in publicly available data repositories such as ArrayEx-press and the Gene Expression Omnibus (GEO). Such wealth of biological data has motivated biologists to compare expression profiles gen-erated from biologically-related microarray ex-periments in order to unravel biological mecha-nisms underlying various states of diseases. However, without the availability of appropriate software and tools, they are compelled to use manual or labour-intensive methods of com-parisons. A scrutiny of current literature makes it apparent that there is a soaring need for such bioinformatics tools that cater for the multiple analyses of expression profiles.

In order to contribute towards this need, we have developed an efficient software pipeline for the analysis of multiple gene expression data-sets, called Micropath, which implements three principal functions; 1) it searches for common genes amongst n number of datasets using a number crunching method of comparison as well as applying the principle of permutations and combinations in the form of a search strat-egy, 2) it extracts gene expression patterns both graphically and statistically, and 3) it streams co-expressed genes to all molecular pathways belonging to KEGG in a live fashion. We sub-jected MicroPath to several expression datasets generated from our tolerance-related in-house microarray experiments as well as published data and identified a set of 31 candidate genes that were found to be co-expressed across all interesting datasets. Pathway analysis revealed

their putative roles in regulating immune toler-ance. MicroPath is freely available to download from: www.1066technologies.co.uk/micropath. Keywords: Co-Expression Analysis, Microarray, Permutations and Combinations, Multiple Gene Expression Analysis

1. INTRODUCTION

There is a general consensus amongst scientists and re-searchers that the fundamental asset of microarray tech-nology lies in its inherent ability to produce a global snapshot of the cellular state in the milieu of any given biological question. It is therefore not surprising that microarrays have revolutionised the field of molecular biology by offering an efficient and cost effective me-dium for biologists to quantify mRNA transcript levels of several thousands of genes concurrently in order to observe specific states of the transcriptome (in response to a particular treatment or specific time point). Owing to this innate faculty to decipher the transcriptome, gene expression profiles pertaining to a wide variety of bio-logical questions are being rapidly generated by scien-tists worldwide and are deposited and subsequently made accessible through public repositories such as ArrayEx-press [1] and the Gene Expression Omnibus [2]. With so much wealth of high throughput biological data made available, biologists have become motivated to utilise these sets of data in an attempt to investigate common regulatory signatures, which may be implicating the transcriptome state across multiple gene expression pro-files sharing a similar biological theme. One of the most widely accepted methodologies of comparing expression profiles is based on the assumption that genes across different biological conditions sharing similar expression patterns are likely to be involved in the same biological processes [2], and therefore, may share common regula-tory signatures. By using this method of comparison, which is one of the most successful methods to date, coupled with the availability of publicly available data

Biosciences, 2009, 1, 1-11

Page 4: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

repositories offering gene expression profiles, biologists have been granted the opportunity to answer complex biological questions pertinent to biological phenomena underlying various different disease states.

To this end, we have developed a novel bioinformatics software pipeline called MicroPath, which specialises in the cross comparison of multiple gene expression data-sets and attempts to identify common regulatory signa-tures from the standpoint of molecular pathway analysis. When one scrutinises current literature relevant to auto-mated solutions of gene expression analysis, it becomes apparent that there is an increasing demand for software applications that offer an efficient pipeline to the analysis of multiple gene expression profiles. Although current meta-analyses studies have been conducted with the purpose of employing statistical techniques to compare cDNA and affymetrix gene expression profiles [3,4,5,6], it cannot be denied that there is a mounting need for this process to be automated. Nevertheless, various ap-proaches/algorithms of statistical nature have already been implemented with the purpose of identifying the most relevant pathways in a given experiment [7,8,9] together with methods such as Gene Set Enrichment Analysis (GSEA), which ranks genes based on the cor-relations between their expressions and observed pheno-types in the context of biological pathway discoveries [10]. There are also tools available that functionally an-notate gene expression data [11,12]. Albeit, it remains infeasible for biologists to cross compare several expres-sion profiles without an automated solution, and hence, they are faced with the labour-intensive task of employ-ing manual methods to carry out their comparisons. Mi-croPath uses the meta-analytic standard and has been specifically developed to: compare several significantly expressed sets of genes in order to find the intersection of common genes using both number crunching methods as well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally, mapping these sub-sets of co-expressed genes to mo-lecular pathways all in the form of a high throughput pipeline.

2. IMPLEMENTATION

The front-end of MicroPath was developed in Visual Basic.Net and Perl, and the database back-end was de-veloped in MySQL. Upon analysing the users input files (gene expression profiles), processed data is displayed intuitively on the graphical user interface, which is equipped with various interactive objects such as chart-ing facilities, buttons, drop-down menus and user in-put/output dialogues. The interface is also equipped with a function to export processed data into Microsoft excel for further scrutiny and use.

2.1. System Architecture

MicroPath carries out meta-profiling of multiple gene expression datasets using two different approaches.

Firstly, the intersection of common genes is identified across n number of expression profiles, which is then plotted graphically using a simple number crunching exercise. The second approach applies to a situation where an attempt to identify common genes across n number of expression profiles using the aforementioned approach fails due to the absence of common genes across all datasets (this situation is especially common when a large number of expression profiles are compared, which reduces the probability of finding a common gene amongst them). Consequently, MicroPath applies the permutations and combinations mathematical principle to solve this problem (refer to implementation of meta-analysis strategy below for details). Once the in-tersection of a set of common genes has been identified and subsequently displayed on the interface (using either of the above methods), the next stage in the analysis is to extract patterns from the intersection in order to identify common genes that are being expressed in accordance with the biological question. MicroPath offers a semi- automated graph-based approach to achieve this as well as classical statistics to identify the overall correlation of gene expression. Finally, co-expressed genes (common genes that are expressed in accordance to the relevant biological question) are mapped to all molecular path-ways known to date in order to reveal their molecular dependencies (refer to Figure 1 for the complete system architecture).

2.2. Implementation of Meta-analysis Strategy

In theory, an intersection of a sub-set of common genes across multiple gene expression profiles should be eas-ily attainable using simple number crunching methods of comparison. In practice, this is not always the case since the likelihood of identifying genes sharing com-mon accession identifiers decreases as the number of profiles to compare increases. This inverse relationship makes sense both mathematically and biologically. From a biological perspective, regulatory signatures tend to be diluted over entire datasets and as a result, only a proportion of the total number of profiles to compare may actually share common genes. In such a scenario, using a simple method of comparison would break down at some point and no common genes would be reported to the user, although common genes may be present within n-1 expression profiles. To pre-vent potentially interesting biological findings to be hampered at this point in the analysis, we have applied the principle of mathematical combinations to the comparison of multiple gene expression profiles. All possible combinations of comparing n number of datasets with each other are firstly computed using the combination equation:

2

M. Khan ET AL.

Page 5: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

This generates the total number of permutations of comparing datasets (Cr) for given values of n (total number of datasets imported by user) and r (number of

intended datasets used to search for common genes when zero common genes are reported across n datasets) (Ta-ble 1).

Figure 1. Functions of MicroPath. Users are prompted to import up to 10 gene expression profiles, which are then compared using a direct comparison method. If this method yields zero common genes, MicroPath automatically attempts to identify an intersection of common genes by reducing the search space to n-1 datasets using permutations and combinations. This process is continued until at least 1 common gene is reported. Following this, users are provided with a function to search for expression patterns graphically and gene expression correlations are calculated statistically using the pearson’s correlation coefficient algorithm. Finally, co-expressed genes are mapped to all molecular pathways of KEGG in a high throughput fashion by automatically accessing its API via SOAP-Lite.

3

M. Khan ET AL.

Page 6: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

Table 1. Multiple gene expression profile search strategy generated from applying the principle of permutations and combinations. The first column represents the total number of expression datasets, n, that users may import (this is the search space). The second column represents, r, the number of expression datasets to compare if zero common genes are reported to be matched across n datasets. The final column represents the total number of mathematical com-binations possible for each given value of n and r.

Total number of expression datasets (n)

Number of intended expression datasets to compare when comparing n datasets yields no results (r)

n - r

Total number of combinations of r (Cr)

10 9 1 10 10 8 2 45 10 7 3 120 10 6 4 210 10 5 5 25210 4 6 210 10 3 7 120 10 2 8 45 9 8 1 9 9 7 2 36 9 6 3 84 9 5 4 126 9 4 5 126 9 3 6 84 9 2 7 36 8 7 1 8 8 6 2 28 8 5 3 56 8 4 4 70 8 3 5 56 8 2 6 287 6 1 7 7 5 2 21 7 4 3 35 7 3 4 35 7 2 5 216 5 1 6 6 4 2 15 6 3 3 206 2 4 15 5 4 1 5 5 3 2 10 5 2 3 10 4 3 1 4 4 2 2 6 3 2 1 3

These combinations of datasets (Cr) are then used as a

criterion to search for common genes across r number of gene expression profiles when comparing n number of datasets fail to yield any common genes. However in this scenario, n number of datasets are still used as the search space from which all possible combinations (Cr) of r datasets are compared to each other in order to increase the probability of finding a common gene. Once common genes have been identified using this method, MicroPath will report the results to the interface.

2.3. Extracting Gene Expression Patterns Graphically and Statistically

Following the identification of common genes across n datasets using either of the methods described earlier, the next stage in the analysis is to generate a graphical repre-sentation of this expression data from which biologically meaningful patterns can be extracted. Because signals pertaining to transcriptome states tend to be diluted over entire profiles, a specific criterion is required to narrow down the common genes of interest to include only those genes that are consistently regulated according to the bio-logical question. The assumption we have made is that any

given common gene across n datasets can exhibit one of three specific behaviours. It can either be consistently upregulated across all datasets, downregulated across all datasets and up or downregulated across all datasets. Based on the nature of the specific biological question, users can select the appropriate pattern from the options, which will result in a graphical display of those genes which satisfy the search criteria. Together with this faculty to graphically extract patterns for individual gene expres-sion data points, MicroPath also implements the pearsons correlation coefficient statistical test in order to extract a global gene expression pattern existing between common genes pertaining to two individual expression profiles. The correlations are calculated in a pair-wise manner until each expression data has been statistically compared to all other datasets within n, according to the pearsons correla-tion coefficient equation:

( )( )

( ) ( )⎟⎟

⎜⎜

⎛−

⎟⎟

⎜⎜

⎛−

−=

∑ ∑∑ ∑

∑∑∑

n

YY

n

XX

n

YXXY

r2

2

2

2

4

M. Khan ET AL.

Page 7: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

Each pair-wise score is then finally averaged in order to provide a global measure of correlation existing be-tween n expression profiles. Scores are reported from-1 (perfect negative correlation) to+1 (perfect positive cor-relation).

2.4. High Throughput Molecular Pathway Analysis

To decipher molecular mechanisms fundamental to the researcher’s biological question, it is necessary to map common gene expression profiles of co-expressed genes to molecular pathways. This is because biological path-ways reveal molecular dependencies that exist between genes by illustrating how they collaborate with one an-other when they participate in specific biological func-tions. Furthermore, pathways reveal various signalling cascades that play imperative roles in dictating these gene associations. In light of this, we have implemented Micropath to access the Application Programming Inter-face (API) of the molecular pathway database belonging to KEGG [13] using SOAP-Lite in order to dynamically interact with the static pathway maps. Perl scripts were

written for MicroPath to specifically 1) search for user’s co-expressed genes in all biological pathways, 2) high-light genes on to pathways, and 3) return the results of the search to Micropath’s interface (i.e. URL’s of colour coded pathway maps) (Figure 2). Once MicroPath has searched for all of the user’s co-expressed genes in all of the molecular pathways, the URL of each pathway is displayed on the sub-interface. In order to avoid redun-dancy issues, the URL for each pathway will highlight all co-expressed genes that participate in a given path-way. To help users identify biologically meaningful pathways relevant to their specific biological question, MicroPath will calculate the number of genes identified in a given pathway and 1) express this as a percentage in relation to the total number of common genes from the intersection and 2) express this as a percentage in rela-tion to the total number of genes belonging to that path-way.

Clicking on these links will generate the specific KEGG pathway in HTML on which users co-expressed genes will be highlighted.

Figure 2. Flow diagram of how MicroPath carries out high throughput molecular pathway analysis by connecting to the API of KEGG.

5

M. Khan ET AL.

Page 8: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

2.5. Generating and Processing Gene Ex-pression Datasets

Gene expression datasets used for the purpose of this article were generated from our in-house microarray ex-periments as well as published datasets, where the fold change approach was used to select a set of differentially expressed genes from pre-processed data. Matchminer [14] and the Synergizer [15] tools were used to convert gene Hugo identifiers and long names into Genbank ac-cession Id’s in order to ensure that the gene identifiers were of the same type across all datasets prior to com-parison. Raw expression data was generated, filtered and normalised using GenePix pro 4.1 [16] and Acuity 4.0 [17] software. Although we used cDNA microarray data for the purpose of demonstrating MicroPath’s capabilities, other data types generated from different platforms such as affymetrix can also be analysed provided Genbank ac--cession identifiers are used to represent the genes.

3. RESULTS AND DISCUSSION

Regardless of the biological question, a typical microar-ray experiment almost always results in the generation of a set of differentially expressed genes, which represents genes of most importance to the biologist. Therefore, by carrying out several biologically related microarray ex-periments, several sets of differentially expressed genes would be generated, which would need to be compared and mined efficiently in order to help answer the bio-logical questions asked by the investigators from differ-ent research laboratories around the world. Employing manual methods of comparison in this situation would be very inefficient and infeasible. In light of this, to demon-strate the benefits that can be derived from analysing multiple gene expression profiles using MicroPath, we employed datasets generated from our in-house microar-ray experiments as well as published data. The biological question related to these studies focussed on unravelling the underlying molecular mechanisms dictating immune tolerance by analysing the role of Egr-2 in implicating T-cell tolerance. Although the Early Growth Response gene (Egr-2) has been recently characterised as a candi-date tolerance-inducing transcription factor, which inter-acts with specific genes in order to induce the state of T-cell tolerance [18,19], the possibility of further puta-tive unknown target genes exists that may be vital to the mechanism of tolerance. Hence, the biological purpose of our experiments was to attempt to identify such po-tentially important genes via the comparison of biologi-cally related expression datasets using MicroPath.

Data consisting of a set of differentially expressed genes generated from the comparison of tolerance Vs activated mice CD4+ T cells was obtained from the Ar-rayExpress website (accession number: e-mexp-283). The first in-house experiment aimed to generate differ-entially expressed genes from the comparison of an un-stimulated T cell line from which the Egr-2 gene had been knocked out and a wild type un-stimulated cell line.

The second in-house experiment focussed on the com-parison between an Egr-2 knock-out T cell line activated with CD3/CD28 for 6 hours and a wild type cell line also activated with CD3/CD28 for 6 hours. Results generated from these experiments were then compared with the aforementioned published tolerance data using MiNer in order to understand the molecular mechanisms control-ling immune tolerance.

3.1. Comparison of Gene Expression Pro-files Pertaining to Immune Tolerance

The first step in the analysis was to subject the above- mentioned expression profiles to MicroPath in order to identify genes amongst them that had the same accession identifiers. Having done this, MicroPath identified 31 differentially expressed genes that were common to all three expression datasets and generated a graph to de-lineate their expression values (Table 2, Figure 3). A simple number crunching exercise was used to perform this task since its use generated a reasonable number of common genes, which did not warrant the use of permu-tations and combinations to perform the search. The next step was to use these 31 differentially expressed genes as a search space to determine those genes that have the potential to be co-expressed. In order to do this, we em-ployed MicroPath’s graphical utility to extract gene ex-pression patterns, which led to the identification of 6/31 genes that were found to be upregulated in tolerance Vs activated CD4+T-cells and downregulated in both p-KOA0 Vs WTA0 and p-KOA6 Vs WTA6 datasets (Table 2). The remaining 25 common differentially ex-pressed genes were found to be highly and lowly ex-pressed in tolerance and knock-out datasets respectively. Statistical analysis revealed an overall pearson’s correla-tion score of 0.109 from the pair-wise comparison of tolerance data with p-KOA0 Vs WTA0 and a score of -0.123 from the comparison of tolerance with p-KOA6 Vs WTA6. Furthermore, Reverse Transcriptase PCR experiments confirmed that 15 genes from our tolerance Vs activated data were found to be highly expressed in immune tolerance and from these 15 genes, 8 were found to be common amongst all three expression profiles (Ta-ble 2).

Because Egr-2 has been previously characterised and found to be highly upregulated in immune tolerance, these results generated from MicroPath are biologically significant because as expected, those genes that were highly expressed in our tolerance Vs activated datasets were found to be insignificantly expressed in our p-KOA6 Vs WTA6 and p-KOA0 Vs WTA0 datasets (from which the Egr-2 gene was knocked out of the cell lines). Amongst these genes, Ap1s1, Shd, Surf6, Vil2, Lilrb4, Tbx21 and Pdcd1lg2 (Table 2) have been con-firmed to be upregulated in the process of immune toler-ance [20], all of which were found to exhibit low expres-sion values in our knock-out expression datasets. This consistent gene expression pattern can be seen graphi-cally in Figure 3. However, from the 31 interesting common genes, 16 were not confirmed to be involved in

6

M. Khan ET AL.

Page 9: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

Table 2. Tabulated overview of gene accession ids, Hugo ids and fold change values belonging to 31 common genes identified from the comparison of tolerant Vs activated CD4+T cells, p-KOA0 Vs WTA0 and p-KOA6 Vs WTA6 expression datasets. Entries highlighted in bold represent genes that were found to be up-regulated in tol-erance Vs activated CD4+ T cells and down-regulated in both p-KOA0 Vs WTA0 and p-KOA6 Vs WTA6 datasets. Entries with * represent genes that have been confirmed to be highly expressed in tolerance by RT-PCR.

Gene ID HUGO ID Fold Change (p‐KOA0 Vs WTA0)

Fold Change (p‐KOA6 Vs WTA6)

Fold Change (Tolerance Vs activated)

NM_007381 Acadl 0.371336 0.624525 6.373

NM_007457 Ap1s1 * 0.542474 0.31525 4.965 NM_007664 Cdh2 0.243646 ‐0.7999 1.658 NM_008205 H2-M9 ‐0.08048 0.116434 2.857 NM_008972 Ptma ‐1.31334 ‐0.46688 5.42 NM_009128 Scd2 ‐0.18816 ‐0.39366 4.552

NM_009168 Shd * ‐0.17495 ‐0.53582 2.838

NM_009298 Surf6 * 0.272072 0.126301 4.365 NM_009465 Axl 0.149539 1.475806 3.836

NM_009510 Vil2 * ‐0.49824 0.319645 3.151 NM_010102 Edg6 0.313489 0.132689 1.573 NM_010413 Hdac6 ‐0.90335 ‐0.8226 4.745

NM_010548 Il10 * 3.083863 1.660739 3.521 NM_010638 Bteb1 0.024803 ‐0.42533 1.613 NM_011125 Pltp ‐0.5354 ‐0.71558 4.363 NM_011620 Tnnt3 ‐0.61646 0.035844 1.665 NM_011696 Vdac3 ‐0.98084 0.191964 4.701 NM_011705 Vrk1 0.466922 ‐0.34601 2.032 NM_013488 Cd4 0.584494 0.420277 4.905 NM_013490 Chka ‐2.13728 ‐0.69458 5.677

NM_013532 Lilrb4 * 0.792335 1.110898 2.111 NM_013615 Odf2 2.776384 3.004449 4.809 NM_013814 Galnt1 ‐0.47752 0.500297 2.246 NM_013866 Zfp385 0.118995 0.428591 1.664 NM_016772 Ech1 ‐0.0666 0.053081 4.284

NM_019507 Tbx21 * 0.124767 ‐0.32731 1.595 NM_019561 Ensa 0.778767 ‐0.44703 1.718 NM_019777 Ikbke 0.291602 ‐0.00772 1.609 NM_020027 Bat2 0.291219 ‐0.23966 5.091

NM_021396 Pdcd1lg2 * 1.140087 0.079182 3.921 NM_021538 Cope 0.154049 0.264541 2.035

tolerance by RT-PCR yet some of them also exhibited a coherent pattern of gene expression. For example, Ptma, Scd2, Hdac6, Pltp and Chka were all highly expressed in tolerance and conversely downregulated in both knock out datasets. There is a possibility that these genes may also be insignificantly expressed due to the absence of Egr-2. However, conducting RT-PCR for these specific genes would be required in order to confirm that their over-expression results in T-cell tolerance.

3.2. Deciphering Gene Regulatory Networks of Co-Xpressed Genes Via High Throug -Hput Molecular Pathway Analysis

The final stage of the analysis entails using MicroPath’s function to connect to the Application Programming In-terface (API) of KEGG via SOAP-Lite in order to carry out high throughput molecular pathway analysis. There-fore, for this stage in the analysis, we used MicroPath to map 31 of our co-expressed interesting genes to KEGG pathways and from these 31 genes, 14/31 were identified in a total of 31 molecular pathways (Table 3). Interest-ingly, several of these pathways were related to the study of immunology and illustrated biological networks such as MapKinase, Jak-Stat, T-cell receptor signalling and

Cytokine-cytokine interactions. More specifically, the Pdcd1lg2 gene (accession id: NM_021396) was identi-fied in the Cell Adhesion Molecules (CAM) pathway (Table 3) and studies have confirmed that the over-ex-pression of Pdcd1lg2 has resulted in consistently low levels of Interleukin-2 (IL-2) in naive CD4(+) T-cells [21]. Further studies have correlated the over-expression of this gene to the negative regulation of T-cell activation. In one particular study, PDL2 (Pdcd1lg2) deficient mice were created in order to characterise the function of this gene in T-cell activation and tolerance, and results gen-erated from this study suggested that Antigen-presenting cells from PDL2-deficient mice were found to be more potent in activating T-cells in vitro when compared to the wild-type counterparts [22]. These findings are conclu-sive and correlate well with the results generated from our in-house microarray experiments because using Mi-croPath to compare all three of our datasets followed by extracting gene expression patterns from them resulted in an important finding that Pdcd1lg2 was not only found to be over-expressed in tolerance (fold change of 3.921), but it was also under-expressed in our KOA0 Vs WTA0 and KOA6 Vs WTA6 knock-out datasets (with a fold change of 1.140 and 0.079 respectively) (Table 2). This

7

M. Khan ET AL.

Page 10: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

particular finding is in agreement with the aforemen-tioned studies, concluding that Pdcd1lg2 has a negative inhibitory role towards the process of T-cell activation. In addition, molecular pathway analysis of the Inter-leukin-10 (IL-10) gene using MicroPath, identified its role in the Cytokine-cytokine interaction, Jak-STAT and T-cell receptor signalling pathways; all three of which are important immunological pathways. IL-10 is a well known cytokine, which has previously been shown to successfully induce immune tolerance in Dendritic Cells [23]. Results generated from MicroPath revealed that IL-10 was highly expressed in our tolerance data with a fold change of 3.521, which was found to be expressed lower in our KOA0 Vs WTA0 profile (fold change: 3.084). Interestingly, following activated with

CD3/CD28 for 6 hours, its expression dropped signifi-cantly to 1.66, perhaps attributable to the absence of Egr-2. Likewise, other genes from the 31 co-expressed interesting genes show similar patterns of expression and perhaps may be candidate genes for Egr-2 mediated T-cell tolerance. However, this is yet to be confirmed by publications. Finally, the pathway analysis function of MicroPath was used to calculate the percentage of genes identified in each pathway in relation to 1) the intersec-tion of common genes and 2) the total number of genes comprising each pathway. From the results, the Cell Ad-hesion Molecules (CAM) pathway was particularly sig-nificant since 12.91% of the overall pathway was af-fected by 6.84% of genes common to all 3 expression profiles (Table 4).

Figure 3. A preliminary graphical overview of common interesting genes generated from the comparison of tolerant Vs activated CD4+ T cells (green), p-KOA0 Vs WTA0 (red) and p-KOA6 Vs WTA6 (blue) expression datasets. It can be seen that genes that are highly expressed in tolerance appear to be expressed poorly in the knock-out datasets. This pattern is consistent throughout the 31 gene expression data points. Table 3. Tabulated data generated from high throughput molecular pathway analysis of co-regulated genes. 14/31 common interesting genes were identified in a total of 31 molecular pathway maps of KEGG.

GenBank Accession ID HUGO ID Pathway ID Total No of

pathwaysGenBank

Accession IDHUGO

ID Pathway ID Total No of pathways

NM_007381 Acadl

mmu00071mmu00280mmu00410mmu00640mmu03320

5 NM_009510 Vil2 mmu04670 mmu04810 2

NM_007664 Cdh2 mmu04514 1 NM_008205 H2-M9mmu04514 mmu04612 mmu04940

3

NM_013488 Cd4

mmu04514mmu04612mmu04640mmu04660

4 NM_013814 Galnt1 mmu00512 mmu01030 2

NM_011696 Vdac3 mmu04020 1 NM_019777 Ikbke mmu04010 mmu04620 2

NM_011125 Pltp mmu03320 1 NM_010102 Edg6 mmu04080 1

NM_016772 Ech1 mmu00350mmu00362mmu00628

3 NM_021396 Pdcd1lg2 mmu04514 1

NM_010548 Il10 mmu04060mmu04630mmu04660

3 NM_013652 Ccl4 mmu04060 mmu04620 2

8

M. Khan ET AL.

Page 11: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

The fundamental strength of MicroPath stems from the implementation of a novel search strategy for the comparison of multiple gene expression profiles. Al-though there are a few software that cater for multiple gene expression comparison, there is currently no software that searches for common genes beyond sim-ple number crunching methods of comparison (Table 5). Just because a direct comparison of a given num-ber of datasets may not yield any common genes, it

does not mean that the analysis should end here since there is a potential to identify common genes across n-1 profiles. MicroPath ensures that such genes are identified, which current software would overlook. When coupled with other important functions such as pattern extraction and pathway analysis, it becomes apparent that MicroPath would offer valuable assis-tance to biologists wanting to decipher their high throughput data.

Table 4. Results generated from pathway analysis showing the extent to which each pathway is affected by common genes from the intersection. The percentages reflect the proportion of common genes that contribute towards controlling the proportion of each pathway.

Pathway ID Pathway Name GenBank

Accession ID Result from Analysis

mmu00071 Fatty Acid Metabolism NM_007381 3.26% of genes contribute 8.45% role in pathway

mmu00280 Valine, leucine and isoleucine degradation NM_007381 3.26% of genes contribute 2.73% role in pathway

mmu00410 Beta Alanine Metabolism NM_007381 3.26% of genes contribute 7.14% role in pathway

mmu00640 Propanoate Metabolism NM_007381 3.26% of genes contribute 5.88% role in pathway

mmu03320 PPAR Signalling Pathway NM_007381 3.26% of genes contribute 1.92% role in pathway

mmu04514 Cell Adhesion Molecules

NM_007664 NM_008205 NM_013488

NM_021396

12.91% of genes contribute 6.84 % role in pathway

mmu04612 Antigen Processing & Presentation NM_013488 3.26% of genes contribute 2.44% role in pathway

mmu04640 Hematopoietic Cell Lineage NM_013488 3.26% of genes contribute 0.76 % role in pathway

mmu04660 T Cell Receptor Signalling Pathway NM_013488 NM_010548

6.45 % of genes contribute 3.33 % role in pathway

mmu04020 Calcium Signalling Pathway NM_011696 3.26% of genes contribute 2.33 % role in pathway

mmu00350 Tyrosine Metabolism NM_016772 3.26% of genes contribute 2.17 % role in pathway

mmu04060 Cytokine-cytokine receptor interaction NM_010548 NM_013652

6.45 % of genes contribute 0.73 % role in pathway

mmu04630 JAK-STAT Signalling Pathway NM_010548 3.26% of genes contribute 3.85 % role in pathway

mmu04670 Leukocyte Transendothelial Migration NM_009510 3.26% of genes contribute 1.25 % role in pathway

mmu04810 Regulation of Actin Cytoskeleton NM_009510 3.26% of genes contribute 1.47 % role in pathway

mmu04940 Type I Diabetes Mellitus NM_008205 3.26% of genes contribute 4.35 % role in pathway

mmu00512 O-Glycan Biosynthesis NM_013814 3.26% of genes contribute 10 % role in pathway

mmu04010 MAPK Signalling Pathway NM_019777 3.26% of genes contribute 0.83 % role in pathway

mmu04620 Toll-Like Receptor Signalling Pathway NM_019777 NM_013652

6.45% of genes contribute 1.32 % role in pathway

mmu04080 Neuroactive Ligand-Receptor Interaction NM_010102 3.26% of genes contribute 1.15 % role in pathway

Table 5. Functional comparison of MicroPath to similar software packages and applications.

Function MicroPath EXPANDER

[24] INCLUSIVE

[25] Pathway

Studio [26] KEGG [13]

BioCarta [27]

MaXlab [28]

Suitable for high throughput data analysis

YES YES YES YES NO NO YES

Suitable for comparing multiple gene expression profiles

YES YES NO YES NO NO YES

Implementation of efficient algorithm to search for common genes from n-1 datasets

YES NO NO NO NO NO NO

Graphical representation of gene expression values from multiple datasets

YES NO NO NO NO NO YES

Pattern extraction from Graph data

YES NO NO NO NO NO NO

Construction of pathway maps YES NO NO YES YES YES NO

Mapping gene expression data to pathway maps

YES NO NO YES NO NO NO

User interactive software (S) or Database (D)

S S S S D D S

9

M. Khan ET AL.

Page 12: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

4. Conclusion

In this article, we have illustrated the potential benefits that can be derived from using MicroPath for the analy-sis of multiple gene expression profiles. Each function of the software has been developed to streamline the overall analysis pipeline, providing users with a walkthrough of how their data is biologically deciphered. Here, we have applied to our software, microarray datasets generated from different laboratories pertaining to the molecular mechanisms underlying immune tolerance. However, MicroPath is capable of analysing data for any given biological question, whether the datasets are taken from public repositories such as ArrayExpress or generated from in-house microarray experiments. We believe that its faculty to use both number crunching and permuta-tions and combinations as the search strategy to identify the intersection of common genes, coupled with its func-tion to extract gene expression patterns graphically and statistically makes this an attractive software for biolo-gists to use. Finally, its ability to carry out live streaming of mapping genes to biological pathways makes it a use-ful tool for the automation of multiple gene expression analysis. Availability and requirements Project name: MicroPath Project home page: www.1066technologies.co.uk/mi-cropath Operating system(s): MicroPath has been tested on Windows 2000, XP and Vista Programming language: Visual Basic.Net, Perl Other requirements: None License: N/A Any restrictions to use by non-academics: No

Acknowledgements This study was supported by grants from the UK Medical Research Council (MRC) (Grant number: G0300520).

REFERENCES [1] U. Sarkans, H. Parkinson, G. G. Lara, A. Oezcimen, A. Sharma,

N. Abeygunawardena, S. Contrino, E. Holloway, P. Rocca- Serra, G. Mukherjee, M. Shojatalab, M. Kapushesky, S. A. San-sone, A. Farne, T. Rayner and A. Brazma. (2005) The ArrayEx-press gene expression database: a software engineering and im-plementation perspective. Bioinformatics 21(8): 1495- 1501.

[2] T. Barrett and R. Edgar. (2006) Mining Microarray Data at NCBI’s Gene Expression Omnibus (GEO). Methods Mol Biol 338: 175-190.

[3] D. Ghosh, Barette, T. R., Rhodes, D. and Chinnaiyan, A. M. (2003). Statistical issues and methods for meta-analysis of mi-croarray data, A case study in prostate cancer. Funct. Integr. Genomics 3, 180-188.

[4] D. R. Rhodes, T. R. Barrette, M. A. Rubin, D. Ghosh and A. M. Chinnaiyan, (2002). Meta-analysis of microarrays: Interstudy validation of gene expression profiles reveals pathway dysregu-lation in prostate cancer. Cancer Res. 62, 4427-4433.

[5] D. R. Rhodes, J. Yu, K. Shanker, N. Deshpande, R. Varambally, D. Ghosh, T. Barrette, A. Pandey and A. M. Chinnaiyan (2004).

Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. Proc. Natl. Acad. Sci. USA 101, 9309-9314.

[6] J. Wang, K. R. Coombes, W. E. Highsmith, M. J. Keating and L. V. Abruzzo (2004). Differences in gene expression between B-cell chronic lymphocytic leukemia and normal B cells: A meta-analysis of three microarray studies. Bioinformatics 20, 3166-3178.

[7] S. Draghici, P. Khatri, A. L. Tarca, K. Amin, A. Done, C. Voichita, C. Georgescu and Romero, R. (2007). A systems biol-ogy approach for pathway level analysis. Genome Res. 17, 1537-1545.

[8] J. Stelling, (2004). Mathematical models in microbial systems biology. Curr. Opin. Microbiol. 7, 513-518.

[9] G. Joshi-Tope, M. Gillespie, I. Vasrik, P. D’Eustachio, E. Schmidt, B. de Bone, B. Jassal, G. R. Gopinath, G. R. Wu, L. Matthews, et al. (2005). A knowledgebase of biological path-ways. Nucleic Acids Res. 33, D428-D432.

[10] A. Subramanian, P. Tamayo, V. K. Mootha, S. Mukherjee, B. L. Ebert, M. A. Gillette, A. Paulovich, S. L. Pomeroy, T. R. Golub, E. S. Lander, et al. (2005). Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide ex-pression profiles. Proc. Natl. Acad. Sci. 102: 15545-15550.

[11] S. Khalid, F. Fraser, M. Khan, P. Wang, X. Liu and S. Li, (2006a). Analysing Microarray Data using the Multi-functional Immune Ontologiser. J. Integrative Bioinformatics 3, 25.

[12] S. Khalid, M. Khan, P. Wang, X. Liu and S. -L. Li, (2006b). Application of bioinformatics in the design of gene expression microarrays. Second International Symposium on Leveraging Applications of Formal Methods, Verification and Validation (isola 2006), pp. 146-160.

[13] M. Kanehisa, S. Goto, S. Kawashima, Y. Okuno and M. Hattori, (2004) The KEGG resource for deciphering the genome. Nucleic Acids Res, 32.

[14] K. J. Bussey, D. Kane, M. Sunshine, S. Narasimhan, S. Nishi-zuka, W. C. Reinhold, B. Zeeberg, W. Ajay and J. N. Weinstein, (2003) MatchMiner: a tool for batch navigation among gene and gene product identifiers. Genome Biology. 4, R27.

[15] G. F. Berriz and F. P. Roth, The Synergizer service for translat-ing gene, protein, and other biological identifiers. (2008). Bio-informatics. [Epub ahead of print].

[16] GenePix pro 4.1: http://www.axon.com [17] Acuity 4.0: http://www.moleculardevices.com/pages/software/

gn_acuity.html [18] M. Safford, S. Collins, M. A. Lutz, A. Allen, C. Huang, J.

Kowalski, A. Blackford, M. R. Horton, C. Drake, R. H. Schwartz and J. D. Powell, (2005) Egr-2 and Egr-3 are negative regulators of T cell activation. Nature Immunology 6 472-480.

[19] L. E. Warner, J. Svaren, J. Milbrandt and J. R. Lupski, (1999) Functional consequences of mutations in the early growth re-sponse 2 gene (EGR2) correlate with severity of human myeli-nopathies. Hum. Mol. Genet. 8 1245-1251.

[20] P. O. Anderson, B. A. Manzo, A. Sundstedt, S. Minaee, A. Sy-monds, S. Khalid, M. E. Rodriguez-Cabezas, K. Nicolson, S. Li, D. C. Wraith and P. Wang, (2006) Persistent antigenic stimula-tion alters the transcription program in T cells, resulting in anti-gen-specific tolerance. European Journal of Immunology. 36, 1374-85.

[21] H. Kuipers, F. Muskens, M. Willart, D. Hijdra, F. B. van As-sema, A. J. Coyle, H. C. Hoogsteden and B. N. Lambrecht (2006). Contribution of the PD-1 ligands/PD-1 signaling path-way to dendritic cell-mediated CD4 (+) T cell activation. Euro-pean Journal of Immunology. 36 (9), 2472-82.

[22] Y. Zhang, Y. Chung, C. Bishop, B. Daugherty, H. Chute, P. Holst, C. Kurahara, F. Lott, N. Sun, A. A. Welcher and C. Dong, (2006). Regulation of T cell activation and tolerance by PDL2. Proc Natl Acad Sci U S A, 103(31), 11695-11700.

10

M. Khan ET AL.

Page 13: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

[23] X. Li, K. Dou, H. Liu, F. Zhang and L. Cai, (2007). Immune tolerance induced by IL-10 and methylprednisolone modified dendritic cells in vitro. Chinese Journal of cellular and molecular Immunol. 23 (5), 436-8.

[24] R. Shamir, A. Maron-Katz, A. Tanay, C. Linhart, I. Steinfeld, R. Sharan, Y. Shiloh and R. Elkon, (2005) EXPANDER-an integra-tive program suite for microarray data analysis. BMC Bioinfor-matics, 6: 232.

[25] G. Thijs, Y. Moreau, F. D. Smet, J. Mathys, M. Lescot, S. Rom-bauts, P. Rouze, B. D. Moor and K. Marchal, (2002) INCLUSive:

Integrated Clustering, Upstream sequence retrieval and motif Sampling. Bioinformatics, 18, 331-332.

[26] A. Nikitin, S. Egorov, N. Daraselia and I. Mazo,. (2003) Path-way studio-the analysis and navigation of molecular networks. Bioinformatics, 19, 2155-2157.

[27] BioCarta, Charting pathways of life. http://www.biocarta.com. [28] S. Khalid, M. Khan, C. B. Gorle, K. Fraser, P. Wang, X. Liu and

S. Li, MaXlab: A novel application for the cross comparison and integration of biological signatures from microarray studies. In Silico Biology 8, 0029: 2008.

11

M. Khan ET AL.

Page 14: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

Assessment of bone condition by acoustic emission technique: A review

Sharad Shrivastava , Ravi Prakash

ABSTRACT

The paper deals with the review of acoustic emission technique in biomedical field. The re-view is done with the aim to provide an overview of the use of AE technique in biomedical field, mainly concentrated on the AE behavior of bone under different loading conditions, its depend-ence on strain rate, in osteoporosis, monitoring the fracture healing process of bone. The over-all conclusion from the review was that almost all the studies in bone indicated that the initial AE occurs only in the plastic region and just prior to yield. That means the use of AE tech-nique for clinical application cannot be consid-ered as a safe technique, but the early occur-rence of AE events from callus promises the application of AE technique for monitoring the fracture healing process. The negligible effect of soft tissues on AE response of bone prom-ises AE to become a non-invasive method for assessment of bone condition. Keywords: Acoustic Emission; Assessment; Strain Rate; Callus; Fracture Healing; Osteoporosis

1. INTRODUCTION

Bone is primary structural element of human body. The anatomy of human beings is quite well known but the strength and mechanical properties of bones have not been investigated thoroughly. The 206 named bones of skeleton constitutes 18% of the adult human body weight, only skin and fat (25%) and muscles (43%) be-ing greater [1]. In biological terms bone is described as a connective tissue and in mechanical terms bone is a composite material with several distinct solid and fluid phases. The mechanical properties of bone have been more extensively investigated than those of any other biological tissue materials. Although our understanding of the mechanical properties and fracture behavior of

bone is continuously improving, as yet it is far from complete. As pointed by Hayes, W. C. [2] while funda-mental research is needed on many aspects of the me-chanical response of the bone, applications of the tech-niques of analytical and experimental mechanics in this area are made complicated by the fact that bone is highly complex living material.

The initial work in the field of bone biomechanics can be traced back to the 17th century when “attempts to ex-press biological findings in physical terms” [3] were made by the scholars at that time. This philosophical background of the aspect was intensified in the age of determinism which lasted until the middle of the 19th century. One of the main aspects of the research work in that time was to relate the architecture of bone and its mechanical functions. In the year 1832, Bourgery, J. M. [4] in his work on anatomy raised the question of rela-tion between architecture and mechanical functions of bone. In his book on osteology, Ward, F. O. [5] compared the proximal end of the human femur with a crane and he mentioned the compressive and tensile stresses evoked in the bone by loading. In the year 1867, a more detailed analysis of the structure of cancellous bone and its mathematical significance was given by Meyer, G. H. [6] in association with the famous mathematician Cul-mann.

The industrial revolution took place in the second half of the 19th century. It had an impact on the research works in the bone also. New developments were made in the field of material testing and the new methods were developed for mechanical measurements. For a while, these methods were used to determine the in vitro me-chanical properties of bone. The bones were tested under various loading conditions and the ultimate strength of bone was determined by many investigators [7,8,9,10,11, 12,13,14,15,16,17,18]. Mc Elhaney, J. H. [19] from his study on the strain rate dependence of the mechanical properties of bone showed that both the compressive strength and modulus of longitudinally oriented compact bone specimens were significantly increased by increas-ing the strain rate. A critical strain rate for bone has been

Biosciences, 2009, 1, 12-22

Page 15: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

claimed in compression [19], torsion [20] and tension [21]. However Wright, T. M. and Hayes, W. C. [22] found no critical strain rate in tensile tests of bovine bone over a wide strain-rate range.

In the last few decades attempts were made to use the newly developed/improved non-destructive testing tech-niques to find the mechanical properties of in vivo and in vitro bones. Those include finding the elastic constants using ultrasonic techniques [23,24,25,26], finding the mechanical strength of bone specimens by X-ray com-puted tomography, etc.

Assessment of in vivo bone condition is one of the research areas, which have attracted many biomedical engineers and clinical orthopaedicians in recent times. Presently the radiological examination is widely used for the assessment of in vivo bone condition [27,28]. In some clinical problems such as diagnosis of the point of clinical union of fracture, the manual assessment of sta-bility is also used along with the radiological examina-tion. However for many applications the radiographic technique was found to be suffered from low sensitivity. For instance, for the evaluation of osteoporosis it re-quires a minimum loss of 30% or more of bone mineral content before an unequivocal roentgen logical diagnosis can be made [29].

Monitoring the fracture healing process is another area where the currently used techniques failed to give satisfactory results. Uncertainty regarding the signifi-cance of the radiographic and clinical findings may re-sult in unnecessarily long immobilization periods which can produce discomfort and inconvenience for the pa-tients, as well as possible joint stiffness and even per-manent loss of motion especially in the elderly.

In certain long bone shaft fractures the healing process is modified by the method of treatment so that the clini-cal assessment of mechanical integrity is impossible and the interpretation of radiographs may be difficult.

Diaphyseal fractures treated by “rigid” internal fixation always demonstrated this problem, since the fracture cannot be tested mechanically and external callus forma-tion is not seen on radiographs, methods are needed to assess the mechanical integrity of fracture healing in such circumstances, or an unreliable and unsafe reha-bilitation programme may be prescribed.

Mechanical impedance, natural frequency, vibration analysis, stress wave propagation, ultrasonic- measure-ments, impact response technique, electrical potential measurements and mechanical tissue response analysis are some of the techniques, which have been attempted by different investigators for the assessment of in vivo bone condition in the past [30,31,32,33,34,35,36,37,38, 39,40]. However, in all these studies, the intervening soft tissues, whose quantity and quality changes with individual to individual, affected the results. Furthermore some meth-ods were not really non-invasive in nature and some were not practicable for widespread clinical use because of low reliability and complicated instrumentation.

The structure of bone is very much similar to engi-neering composite materials and is therefore advanta-geous to use a non-destructive testing technique, which has already proved it usefulness in the field of composite materials testing. Acoustic emission (AE) technique has been used very successfully for the non-destructive evaluation of composites. The relationship between AE response and mechanical behavior in composite materi-als has been extensively studied in the past [41]. This paper deals with the review of AE technique which is to be used for bone assessment. The review has been broadly classified as follows (Figure 1).

2. ACOUSTIC EMISSION TECHNIQUE

The AE technique is the sound produced by materials as they fail. A familiar example is the audible cracking

Acoustic Emission Technique

AE in bone Historical Background

Behavior of bones in different loading conditions

Prediction of mechanical properties

Effect of strain rate on AE properties on callus-bone

Figure 1. Broad classification of the review study.

13

S. Shrivastava1 ET AL.

Page 16: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

noise from wood. Almost all engineering materials gen- erate acoustic emissions but unlike wood, the sound is too faint to be heard without sensitive electronic monitors. Acoustic emission waves can be detected by means of remote piezoelectric sensors and their source can be lo-cated by timing the wave arrival at several sensors. Thus AE provides a unique method of recognizing when and where deformation is taking place as a structure is stressed.

The first systematic investigations of AE phenomenon was made in 1950 by Kaiser, J. [42] at the technical uni-versity of Munich. In his investigations, the noise emit-ted by the deformation of materials was examined by means of electronic equipment capable of detecting in-audible ultrasonic signals. Kaiser, while working with polycrystalline specimens concluded that acoustic vibra-tions originate in grain boundary interfaces and was be-lieved to be associated with the interaction induced be-tween interfaces by applied stresses. He noted that, for a given materials, characteristics spectra of frequency and amplitude existed. One of the important observations made in his study was that irreversible processes were involved with AE phenomenon; an effect later came to be known as Kaiser Effect. The universality of the AE phenomenon, as recognized by Kaiser, leads to a very wide range of applicability. AE has been recorded from hundreds of materials-metals, composites, ceramics, plas-tics, glasses, building materials, biological materials in vitro and in vivo as well as from multi material structures and joints between different materials [43]. Compared to other NDT techniques which rely on extraneous energy for the illumination of defect; AE enjoys the unique fea-ture that the defect makes its own signal. This leads to a natural complementary between AE and other methods.

3. ACOUSTIC EMISSION DETECTION AND SIGNAL PROCESSING

Figure 2 [44] shows the method of detection of acoustic emission events by remote piezoelectric transducers. Here, an AE source generates an expanding spherical

Figure 2. Detection of acoustic emission event by remote piezoelectric transducers (source: 44).

wave packet losing intensity at a rate of r-2. When this wave reaches the body boundary, a surface wave packet is created, either Rayleigh or Lamb wave type depending on the thickness. This method is mainly used for flaw monitoring in inaccessible areas. The individual signal has a short duration at the source and a corresponding broad spectrum which typically extends from zero fre-quency to many megahertz. The form of the signal at the point of detection is a damped oscillation, developed in the structure according to known principles of acoustic wave propagation. Figure 3 [44] shows the signal waveform of one acoustic emission event and different parameters normally measured to characterize the acous-tic emission source.

4. ACOUSTIC EMISSION IN BONE

Hanagud, S., et al. [45] using bovine femora first dem-onstrated detectable acoustic emissions from bone. His work made the way for other investigators to use the AE technique for characterization of bone and also to ex-plore the possibilities of using it as a tool for clinical orthopaedicians to detect bone abnornamilities [46,47,48, 49,50]. Knet-s, I. V., et al. [46] has shown that the char-acter of the fracture surface depends on the orientation of the load relative to the direction of the osteons, the rate of loading, and the geometrical shape of the actual sample. They concluded that the most promising ap-proach in testing the internal state of a bone is acoustic emission, sometimes also known as the method of stress-wave emission. This approach involves recording of deformation noise in the material due to the develop-ment and further propagation of structural defects. These defects may include dislocations or cracks appearing in the course of loading. This study was the first work to visualize the degree of micro cracks development in bone tissue subjected to longitudinal extension. This experimental work was only visualized for longitudinal loading and the deformation rate considered was also very low (1 mm/min). The study was not conducted for higher strain rates. In another investigation Hanagud, S., et al. [50] conducted AE tests on carefully prepared bone specimens subjected to bending loads. Their specimen included femur from cattle and cadavers. They compared the AE patterns from 60 perfect and defective specimens. The result clearly indicated that the development of an effective early diagnostic tool for osteoporosis was pos-sible by using AE technique.

Thomas, R. A., et al. [51] studied the acoustic emissions from fresh bovine femora and its clinical applications. They employed a more sophisticated set up of AE tech-nique by including both amplitude and pulse width dis- tribution to investigate whole fresh bovine femora which were loaded by compression and bending. They found that both the amplitude distribution and pulse width distribu-tion results of fresh bone had clearly shown characteristic spectra which could be used for the early detection of bone abnormalities such as fracture and osteoporosis.

14

S. Shrivastava1 ET AL.

Page 17: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

Figure 3. Signal waveform for one acoustic emission event (source: 44). Yoon, H. S., et al. [52] developed a new AE technique

for applications to human and animals both non-inva-sively and non-traumatically. Bones from several differ-ent species of animals and different kinds in the same species were tested to obtain AE parameters. Their re-sults indicated that the AE amplitude distributions of all the bones are similar, somewhat independent of the spe-cies of animals and kind in the same species and how-ever different from those of those materials such as met-als, ceramics and plastics. The technique was found use-ful for the diagnosis of micro fractures, such as stress fractures in the tibia of runners, which were not detect-able by conventional X-ray technique until they begun to heal. Moreover conventional techniques would require introducing some additional stresses in to the part of the body under examination, which also introduces addi-tional trauma to the patient. In their technique the low intensity ultrasonic pulses were injected through an AE transducer, instead of applying loads to the bone under tests. The loading as pulses reduced the introduction of trauma to the live subject. Another receiving AE trans-ducer was used to collect five types of useful AE data: per-event- distribution of counts, peak amplitude, energy and pulse duration, and cumulative counts vs time.

Netz, P. [53] monitored the AE response of canine femora in torsion at 6 degrees per second. His work demonstrated that the AE events occur in the non-linear plastic portion of the load deflection curve. Wright, T. M., et al. [54] monitored the permanent deformation of compact bone using AE technique. Uniaxial tension tests were performed on standardized specimens of bovine harvesian bone to examine the contributions of mineral and collagen to permanent deformation in bone and to monitor the damage mechanisms occurring in permanent deformation using AE technique. Their results were con-sistent with a two-phase model for bone in which the mineral behaves as an elastic-perfectly plastic material when bound to the collagen fiber matrix. The AE events occurred just prior to the yield point and continued dur-ing yielding. Significant AE counts occurred again just

prior to fracture. No emissions occurred in the elastic region and few occurred in the major portion of plastic region between yield and fracture. To monitor micro cracks in the specimen they used AE and plotted graphs. Figures 4,5 [54] show stress vs. strain and cumulative acoustic emission counts vs. strain curves for one of the control specimens and decalcified specimens. These graphs indicate the similarity between the acoustic emis-sion data of the bones prior to fracture. Figure 6 [54]

shows stress strain plots based on the mean values from Table 1 [54]. The limitation of their work lies in the hy pothesis that the mineral only exhibit elastic-perfectly plastic behavior in conjunction with collagen. They con-ducted experiments on control, decalcified and depro-tenised groups of specimens with the same hypothesis. The fact is that the deprotenised groups of specimens behave in brittle manner. Hence they suggested further studies to be undertaken to examine the contributions of mineral and collagen for permanent deformation in the bone. The two phase model used could be used to study the qua-sistatic tensile behavior of compact bone but more work could have been carried out for higher strain rates re-sponses before coming to any conclusion.

Figure 4. Stress vs strain and acoustic emission counts vs strain curves for one of the control specimens (source: 54).

Figure 5. Stress vs strain and acoustic emission counts vs strain curves for one of the decalcified specimens (source: 54).

15

S. Shrivastava1 ET AL.

Page 18: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

Figure 6. Stress-strain curves for the three test groups con-structed from the mean values in Table 1. Error bars are shown for ultimate stress values (source: 54). Table 1. Mechanical properties of decalcified and partially deproteinized bovine bones (source: 54).

control decalcified Deproteinized

No of specimens 7 11 10

Yield stress 118(9.8) - -

Yield strain 0.544(0.150) - -

Ultimate stress 128(15.6) 34(7.5) 71(11.5)

Ultimate strain 2.02(0.924) 9.247(1.524) 0.956(0.342)

Elastic modulus 20.6(2.76) 0.37(0.05) 11.3(3.15)

Plastic modulus 0.66(0.357) - -

It is an established fact that the ultimate tensile

strength of bone is dependent on the applied strain rate. Based on these Fisher, R. A., et al. [55] studied the effect of using two different strain rates on the AE in bones. In their work bovine cortical bone was milled in to standard tensile specimens which were tested at two different strain rates while being monitored with AE equipment. They found that the amplitude distribution of the AE events in bone is dependent on strain rate. Greater num-ber of events occurred with the slower strain rate but the events were of lower amplitude than those emitted dur-ing the more rapid strain rate. Here also the initial AE occurred well in to the plastic region of the stress-strain curve near the point of fracture of the tensile specimens. It was evident from the study that if acoustic emission technology is to be utilized clinically for the assessment of fracture healing; careful selection of rate of loading would be necessary. Furthermore the study indicated that acoustic emission response was different at different strain rates. As the emissions did not occur until failure was imminent, it indicates that Acoustic emission tech-nology is not suitable for evaluating the integrity of bone. The limitation included that the specimens taken were of very large sizes to minimize the stress concentrations effects of normal bone architecture. They did their work at only two different strain rates (0.0001/s and 0.01/s);

therefore no conclusion could be made for broad range of strain rates.

Later on Nicholls, P. J., et al. [56] studied the AE prop-erties of callus. In their work, rabbits with 45 degree midshaft oblique osteotomies were strained in shear while monitoring for AE events. Each fracture remained essentially quiet until over 50% of load to failure had been applied. They suggested that since callus formation during fracture healing takes important role in the heal-ing process, the AE from callus may have clinical appli-cations. The limitation of the study lies in the test method used for evaluating bone. As bone is a non-ho-mogeneous substance, it is very difficult to evaluate with instruments which have been designed for homogeneous substances. A test method should be so designed that eliminates background noise, such as slippage of speci-men in the grips and motion of the transducers on the bone surface. This hampers reproducibility of acoustic emission patterns. In most of the AE studies of bone the bone has been tested without the surrounding soft tissues. But in the case of clinical applications of AE, one cannot separate the bone from soft tissues and hence the tests should be performed with soft tissues. Hanagud, S., et al. [57] studied these phenomena. They used freshly dis-sected rabbit tibia and femur with soft tissues. Tests were conducted through bending load. They found that the soft tissues of 2 to 9mm thickness did not affect the bone’s AE response.

A study of bone-tissue samples by Martens, M. [58] used acoustic emission for to study the mechanical be-havior of femoral bones in bending loading. Ono, K. [59], provided an insight about the fundamental theories and equations related to acoustic emission. Stromsoe, K., et al. [60], worked on bending strength of femur using non invasive bone mineral assessment.

The work done till this time demonstrated that the safe use of AE technique for the non-destructive testing of bone is impossible because the AE events occurred only after plastic deformation occurred.

Lentle, B. C. [61], University of British Columbia used acoustic emission to monitor osteoporosis. He de-vised a method for in vivo diagnosis of patients using AE technique, which could also predict the severity of osteoporosis.

Table 2. The predictive capability of acoustic emissions ex-pressed in terms of the specimen’s fatigue life (source: 71).

Maximum

stress(MPa)NF Fatigue life [cycles]

NP Fracture onset via AE [cycles]

Predictive Capability [% of fatigue life]

Specimen 1 55 26363 22580 85 %

Specimen 2 61 35020 33382 95 %

Specimen 3 66 4737 2989 63 %

Specimen 4 71 600 402 67 %

16

S. Shrivastava1 ET AL.

Page 19: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

Table 3. A statistically significant effect of time on these mechanical properties was detected. Within a row, values with differing letters are significantly different from each other (P<0:05) (Source: 65).

weeks after surgery Mechanical properties

4(n=8) 4(n=8) 4(n=9) 4(n=9)

Tensile Strength (N/mm2) 36±21a 130±60b 220±32c 510±95d Tensile Stiffness(N/mm) 0.47±0.18a 1.3±6b 1.8±6b 3.0±2c Maximum Strain (%) 10±3a 3.7±3b 1.6±0.4b 1.8±0.3b AE Initiation Load (N) 21±15a 71±3b 150±62c 330±31d Std. TensileSstrength (N/mm2) 0.12±0.6a 0.33±0.2b 0.55±0.1c 0.82±0.03d Std. Tensile Stiffness (N/mm) 0.028±0.02a 0.57±0.3b 0.82±0.3b,c 1.0±0.06c Std. Maximum Strain 5.6±1.4a 2.2±1b 0.90±0.2b 0.86±0.1b

Table 4. Ash content was calculated by (ash density/apparent density  100). A statistically significant effect of time on these mechanical properties was detected. Within a row, values with differing are significantly different from each other letters (P< 0:05) (Source: 65).

weeks after surgery Mechanical properties 4(n=9) 6(n=9) 8(n=10) 12(n=10)

Apparent density (g/cm3)

0.42±0.07a 0.67±0.07b 1.2±0.1c 1.2±0.08c

Ash density (g/cm3)

0.14±0.04a 0.40±0.06b 0.82±0.08c 0.88±0.06c

Ash content (%)

33±4a 59±5b 71±1b,c 73±1c

Acoustic emission was being used to predict changes

in mechanical properties due to fatigue [62,63,64]. Wa-tanabe, Y., et al. [65] used AE technique to predict mechanical properties of fractures. Experimentally pro-duced fractures of femur in rats were tested in tension and in torsion at 4, 6, 8 and 12 weeks after fracture. AE signals were monitored during these mechanical tests. The values for load and torque at the initiation of the AE signal were defined as new mechanical parameters. Tensile strength, tensile stiffness, and torsional stiffness were found to increase with time. They focused on how AE signals can help a surgeon to remove the external fixators in the sense that AE signals can be used to monitor healing of bones. Table 3 [65] indicates a sta-tistically significant effect of time on these mechanical properties. Table 4 [65] indicates the calculated ash content and the statistically significant effect of time on the mechanical properties of bone. The data obtained by them were compared to the original values and were found out to be almost the same. The study was a first step towards the establishment of AE testing as a means of predicting the callus strength. The two parameters exhibited strong and positive linear correlation with tensile strength and torque. The linear correlations sug-gested that it may be possible to use AE technology to evaluate fracture healing process, following osteotomy surgery. There were still many issues which needed to be resolved to make it clinically viable.

Kevin S. C. K., et al. [66] developed an acoustical tech-nique for the measurement of structural symmetry of hip joints. Since, these techniques depend very much on the intensity and quality of sounds emitted from the joints under investigation. They developed an acoustical tech-nique for the measurement of relative acoustic transmis-sion across both hips of the test subjects while they were subjected to an external vibratory force applied at the sacrum. The merit of this approach was that it allows direct comparison of the sound signals transmitted across both hips regardless of the measure of the input vibratory force. Simultaneously, other acoustic tech-niques like scanning acoustic microscopy [67,68] acous-tic mapping [69,70] was being used to predict and study mechanical properties of tissues and bone.

Ozan A. [71] worked on a hypothesis that an increase in micro damage activity during repeated loading of bone will signal the approaching stress fracture. Inter-ception with the training regime prior to the incidence of the fracture as signaled by acoustic emissions would reduce the time necessary for recuperation. Acoustic emission was used for real time monitoring of micro cracks. They used acoustic emission technique to predict the failure of cortical bone. Table 2 [71] indicates the predictive capability of acoustic emissions expressed in terms of the specimen’s fatigue life.

Information was collected on all acoustic events, re-gardless of whether they originated from micro damage or somewhere else and then signals originating from the micro damage were isolated. The rest of the irrelevant signals were filtered out based on their average fre-quency, duration, amplitude, and intensity. With the non-micro damage signals removed from the data, we were able to determine the number acoustic events re-lated to bone damage as well as the time at which they occurred. Specially designed software is yet to be de-veloped which will segregate the zones of micro damage. Fracture healing and prediction of healing time of frac-tures were increasingly being studied. A review by Browne, M., et al. [72] on acoustic emission’s capability to monitor bone degradation and bone fatigue provided us information with latest developments in this field.

In 2004, Franke, R. P., et al. [73] used acoustic emis-sion for in vivo diagnosis of the knee joint. For the as-

17

S. Shrivastava1 ET AL.

Page 20: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

sessment of the tribological knee function and by the probability of fracture of the femur an adapted Acoustic Emission Measurement System named Bone Diagnostic System (BONDIAS) was developed. This system makes the in vivo analysis of the medical status possible. Dif-ferent mechanisms of cracking were accompanied by different acoustic emission from human femora as shown in literature. An acoustic emission signal typical of crack initiation is shown in Figure 7 [73]. This Figure is indicative of the acoustic emission from healthy knee joint cartilage after a sudden change from a two leg stand to a one leg stand. It is characterized by a very short rise time and an exponential decrease of the am-plitudes. From the medical point of view such mechani-cal loads are regarded as non destructive although there is already crack initiation in the interface of the compact and the trabecular system of the bone. These micro cracks seem to be essential for the physiological bone remodeling. For the description of the development of bone strength over time it is necessary to assess both the threshold of crack initiation and the conditions for crack propagation. The sudden change in amplitude indicates high thickness of the cartilage layer. There are several advantages of the diagnostic procedure by AE when compared with established conventional methods:

1) No pain is caused by this procedure. 2) This procedure is non-destructive. Mechanical load

even beyond the crack initiation threshold are typical of day to day life and necessary for the physiological bone remodeling to avoid the degeneration of the bone and joint system.

3) There is no health burden through ionizing radia-tion as is unavoidable with X-ray examination and CT.

4) There is no danger of infection since this is a non-invasive examination.

5) The time required for the assessment of the acous-tic emission behavior and analyses of data are of the order of seconds to minutes.

Figure 7. Acoustic emission from healthy knee joint carti-lage deformation after the sudden change from a two leg stand to a one leg stand (source: 73).

6) The expenses for the AE measurement system are small compared to X-ray systems.

7) The costs per examination including a detailed di-agnosis are well below costs of other diagnostic proce-dures and there is no danger of infection leading to fur-ther costs, as happens with invasive methods, e.g. endo-scopic examinations.

8) Diagnostic (Real time) monitoring of bone and joint training of sports professionals becomes possible.

The disadvantage of the measurement system sug-gested was that the physician will be left with the bundle of data and the task to evaluate the AE.

Tatarinov, A., et al. [74] proposed multiple acoustic wave method for assessing long bones. The method was based on measurement of ultrasound velocity at different ratio of wavelength to the bone thickness and taking into account both bulk and guided waves. They assessed the changes in both the material properties related to poros-ity and mineralization as well as the cortical thickness influenced by resorption from inner layers, which are equally important in diagnosis of osteoporosis and os-teopenia. More in vivo studies on animals and human volunteers has to be carried out before the proposed method could be made clinically usable. The advantage of the method proposed was that it allowed assessment of changes in both the material properties related to po-rosity and mineralization as well as cortical thickness influenced by resorption from inner layers, which are equally important in diagnosis of osteoporosis and other bone osteopenia. The method could also be used for di-agnosis of bone condition if the contribution of soft tis-sues and topographical heterogeneity in real bones are considered. The method had a potential for better detec-tion of early stage of osteoporosis in long bones. Singh, V. R. [75] reviewed an acoustic imaging technique known as acoustic stress wave propagation technique which was used for bone examination. The technique was developed with the view to solve the problems encountered with the conventional technique like X-rays. As bone is a hetero-geneous, complex and fibrous tissue, determination of very small abnormalities viz. shape and size of bone defects, is not usually possible by means of conventional X-ray technique.

In 2006, Azra Alizad, et al. [76] studied the change in resonant frequencies of a bone due to change in its physical properties caused due to a fracture. Experiments were conducted on excised rat femurs and resonance frequencies of intact, fractured, and bonded (simulating healed) bones were measured. These experiments dem-onstrated that changes in the resonance frequency indi-cated bone fracture and healing. The fractured bone ex-hibits a lower resonance frequency than the intact bone, and the resonance frequency of the bonded bone ap-proaches that of the intact bone. The graphs are indicative of the result (Figures 8,9) [76], that the frequency re-

18

S. Shrivastava1 ET AL.

Page 21: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

sponse of a cut femur is less than the intact femur. The proposed method may be used as a remote and non inva-sive tool for monitoring bone fracture and healing process, and the use of focused ultrasound enables one to selec-tively evaluate individual bones. The proposed method offers several advantages over vibrational methods using external mechanical excitation. The ultrasound can be

applied remotely and directly to the bone under test, thus avoiding interference of overlaying muscle or other tis-sues on force distribution. Furthermore, in contrast to traditional methods in which it is difficult to target small bones and to access them, the proposed method allows application of excitation force directly and selectively to the intended bone. The acoustic method for measuring

Figure 8. Frequency response of the intact femur A. The plots show the motion of the intact femur vs frequency. These Fig plots indicate peaks at 925 Hz, 4.2 kHz, and 8.1 kHz. Peaks of motion below 700 Hz were explored and found not to be related to the femur. Top left: Driven and measured at the end of the femur at frequency range of 100 Hz to 1000 Hz. Top right: Driven and measured at the end of the bone at 1 kHz–10 kHz. Bottom left: Driven and measured at bone midpoint at 100 Hz–1000 Hz. Bottom right: Driven and measured at bone midpoint at 1 kHz–10 kHz (source: 76).

Figure 9. Frequency response of the cut femur (source: 76).

19

S. Shrivastava1 ET AL.

Page 22: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

bone response is suitable for in vivo applications as long as one take the frequency response of the surrounding structures in to account. An advantage of acoustic motion detection method is that it does not require a direct path to the bone because the acoustic emission produced by the bone travels easily in every direction, thus the location of hydrophone is not critical. Further investigations are needed to demonstrate the applicability of the proposed method for evaluation of bone quality in human body. The boundary conditions of bone to body must also be considered before applying the proposed method.

In 2008, Dipan Bose, et al. [77] studied the effect of valgus bending and shear loading on knee joint. They used acoustic sensors to determine the failure timing of soft tissues attached to femur and tibia. The failure tim-ing was determined based on the knee injury mechanism due to valgus loading. At the estimated time injury, the corresponding values of αvalgus_fail, dshear_fail, Mvalgus_fail, and Vshear_fail were designated as the failure parameters of the knee. Numerical methods with accurate geometric and material properties could be implemented to simulate and further extend the injury threshold to alternate loading detection. Elmar K. Tschegg, et al. [78] did stiffness analysis of tibia implant system under cyclic loading. He used a bio- mechanical system integrated with acoustic emission sensors at the screw head. 3 sequences of load-ing were used to determine when the locking screws break. Data was obtained from the acoustic sensors onto a data acquisition board and were processed using a acoustic emission software. This acquired data was used to determine as to which screw is bearing the load and as to when does the screw break.

5. CONCLUSIONS

· Many investigations carried out in the field “Assessment of bone condition” are mainly in vitro studies. For any method which is going to be used in clinical practice, a thorough experimental study with animals and/or a clini-cal study with human volunteers are very much essential.

·The AE is very much dependent on the strain rate. ·AE technique is highly sensitive to specimen damage

and cracks and detects them even before visual detection. · Diagonostic (Real time) monitoring of Bone is possi-

ble. ·It is non-destructive and helps us to predict the time

length of the healing process. ·It has no harmful effects unlike X-rays, which have

radiation effects on patients. ·The emissions from the callus during fracture healing,

gives its possibility to be clinically used.

6. FUTURE RESEARCH DIRECTIONS

·Many researchers have used this technique for in vitro as well as in vivo characterization of bone. However,

the clinical application of the technique was not fully investigated. The time of occurrence of initial AE and the AE response of bone under different loading con-ditions at different strain rates are not well established. So focus can be on finding the exact time of occur-rence of initial AE with respect to the stress/strain curve and the AE behavior under a suitable loading condition at different strain rates.

·In the case of in vivo studies, the AE response of callus is not thoroughly investigated. Also it is necessary to conduct an experimental study with laboratory ani-mals/volunteers or patients to prove the clinical usage of AE.

7. ACKNOWLEDGEMENT The authors wish to acknowledge Mr. Sudeep Mohapatra & Mr. Kaushik. V. for their help. Financial assistance provided by Depart-ment of Science and Technology, Government of India is also grate-fully acknowledged.

REFERENCES

[1] F. G. Evans, (1982) Bones and bones, Trans. of ASME, J. Biomechanical Engg., 104/1, 1-5.

[2] W. C. Hayes, (1978) Biomechanical measurements of bone, in CRC Handbook on Engg., in Medicine and Bi-ology, Section B. Instruments and Measurements, B. N. Feinberg and D. G. Fleming (eds.) CRC Press, Florida, 1, 333-372.

[3] C. J. Singer, (1959) A short history of scientific ideas to 1900, Oxford University Press, New York.

[4] J. M. Bourgery, (1832) Traite Complet de l’ Anatomie de l’ Homme. I. Osteologie, Paris.

[5] F. O. Ward, (1838) Outlines of human osteology, London, 370.

[6] G. H. Meyer, (1867.) Die Architektur der spongiosa, Arch. Anat. Physiol. Wiss. Med., 34, 615-628.

[7] K. K. Hulsen, (1896) Specific gravity, resilience and strength of bone, bull, Biol. Lab. St. Petersburg, 1, 7-35.

[8] F. G. Evans, (1973) Mechanical properties of bone, Spring-field, I. L.

[9] C. O. Carothers, F. C. Smith, and P. Calabrisi, (1949) The elasticity and strength of some long bones of the human body, Nav. Med. Res. Inst. Rept. NM 001056.02.13.

[10] F. G. Evans and M. Lebow, (1951) Regional differences in some of the physical properties of the human femur, J. Appl. Physiol., 3, 563-572.

[11] F. G. Evans and M. Lebow, (1952) The strength of human bone as revealed by engineering techniques, Am. J. Surg., 83, 326.

[12] F. G. Evans, (1964) Significant differences in the tensile strength of adult human compact bone, in Proceedings of the First European Bone and Tooth Symposium, H. J. J. Blackwood (ed.), Pergamon, Oxford, 319-381.

[13] E. D. Sedlin and C. Hirsch, (1966) Factors affecting the determination of the physical properties of femoral cor-tical bone, Acta. Othop. Scand., 37, 29-48.

[14] E. D. Sedlin, (1965) A rheological model for cortical bone, Acta. Othop. Scand., 33, 5-77.

20

S. Shrivastava1 ET AL.

Page 23: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

[15] A. H. Burstein, J. D. Currey, V. H. Frankel, and D. T. Reilly, (1972) The ultimate properties of bone tissue: The effects of yielding, J. Biomechanics, 5, 35-44.

[16] A. H. Burstein and V. H. Frankel, (1968) The viscoelastic properties of some biological materials, Ann. N. Y. Acad. Sci., 46, 158-165.

[17] A. H. Burstein, V. H. Frankel, and D. T. Reilly, (1973) Failure characteristics of bone tissue, in Perspectives in Biomedical Engg, R. M. Kenedi, (ed.), University Park Press, London, 131-134.

[18] A. H. Burstein, D. T. Reilly, and M. Marten, (1975) Ag-ing of bone tissue: Mechanical properties, J. Bone J. Surg, 58, 82-86.

[19] J. H. Mc Elhaney, (1966) Dynamic response of bone and muscle tissue, J. Appl. Physiol., 21, pp. 1231-1236.

[20] M. M. Panjabi, A. A. White, and W. O. Southwick, (1973) Mechanical properties of bone as a function of rate of deformation, J. Bone. J. Surg, 55(A), 322-330.

[21] R. D. Crowinshield and M. H. Pope, (1974) The response of compact bone in tension at various strain rates, Ann. Biomed.Engg., 2, 217-225.

[22] T. M. Wright and W. C. Hayes, (1976) Tensile testing of bone over a wide range of strain rates: effects of strain rate, Microstructure and Density, Med. Biol. Engg., 14, 671-680.

[23] S. B. Lang, (1970) Ultrasonic method for measuring elastic coefficients of bone and results on fresh and dried bovine bones, IEEE Trans. Biomed. Engg., 17, 101-105.

[24] H. S. Yoon and J. L. Katz, (1976) Ultrasonic wave propagation in human cortical bone: I, Theoretical Con-siderations for Hexagonal Symmetry, J. Biomechanics, 9, 407-412.

[25] H. S. Yoon and J. L. Katz, (1976) Ultrasonic wave propagation in human cortical bone: II, Measurements of Elastic Properties and Micro hardness, J. Biomechanics, 9, 459-464.

[26] R. B. Ashman, J. D. Corin, and C. H. Turner, (1987) Elastic properties of cancellous bone: measurement by an ultrasonic technique, J. Biomech, 20(10), 979-986.

[27] E. Lachmann and M. Whelan, (1936) The roentgen di-agnosis of osteoporosis and its limitations, Radiology, 26, 165-177.

[28] E. Lachmann, (1955) Osteoporosis: The potentialities and limitations of its roentgenologic diagnosis, Amer. J. Roentgenology, 74, 712-715,

[29] I. I. H. Chen and S. Saha, (1987) Wave propagation characteristics in long bones to diagnose osteoporosis, J. Biomech., 20(5), 523-527.

[30] F. Y. Wong, S. Pal, and S. Saha, (1983) The assessment of in vivo bone condition in humans by impact response measurement, J. Biomech., 16(10), 849-856.

[31] L. L. Stern and J. Yageya, (1980) Bioelectric potentials after fracture of tibia in rats, Acta Orthop. Scand., 51, 601-608.

[32] S. Saha and R. R. Pelker, (1976) Measurement of fracture healing by the use of stress waves, 22nd Annual ORS, New Orleans, Louisiana.

[33] D. A. Sonstegard and L. S. Mathews, (1976) Sonic diag-nosis of bone fracture healing-A preliminary study, J. Biomech. 9, 689-694.

[34] N. Guzelsu and S. Saha, (1981) Electromechanical wave propagation in long bones, J. Biomech., 14, 19-33.

[35] G. T. Anast, M. S. Fields, and Siegel, (1958) .Ultrasonic technique for the evaluation of bone fracture, Amer. J. Physical. Med., 37, 157-159.

[36] S. Saha, V. V. Rao, V. Malakanok, B. D. Gross, and J. A. Albright, (1982) Ultrasonic evaluation of fracture healing, Transactions of the 28th Annual Meeting of the Orthope-dic Research Society, ORS, Chicago, 259.

[37] O. O. A. Oni, A. Graebe, M. Parse, and P. J. Gregg, (1989) Prediction of the healing potential of closed adult tibia shaft fractures by bone scientigraphy, Clinical Orthope-dics and Related Research, 239-245.

[38] S. M. Bentzen, I. Hivid, and J. Jorgensen, (1987) Me-chanical strength of tibial trabecular bone evaluated by X-ray computed tomography, J. Biomech., 20, 743-752.

[39] T. Sekiguchi and T. Hirayama, (1979) Assessment of fracture healing by vibration, Acta Orthop.Scand., 50, 391-398.

[40] R. J. Donarski, (1989) Bone Fracture measurement using mechanical vibration, Ph. D. Thesis, University of Kent at Canterbury, UK.

[41] R. Prakash, (1980) Non destructive testing of composites, Composites, 10, 217-224.

[42] J. Kaiser, (1950) Untersuchungen uber das auftreten gereausen bein Zugersuch, Ph. D. Thesis, Technische Hutchshule, Munich.

[43] A. A. Pollock, (1979) An introduction to acoustic emis-sion and a practical example, J. Environmental Sciences, March/April, 1-4.

[44] S. Radhakrishnan, (1992) The assessment of bone condi-tion by acoustic emission and acousto-ultrasonic tech-niques. Ph. D. Thesis, Banaras Hindu University.

[45] S. Hanagud, R. G. Clinton, and J. P. Lopez, (1973) Acoustic emission in bone substance, Proceedings of Biomechanics Symposium of the American Society of Mechanical Engineers, ASME, New York, 74.

[46] I. V. Knet-s, U. E. Krauya, and Y. K. Vilks, (1975) Acoustic emission in human bone tissue upon lengthwise stretching, Mekh. Polim, 4, 685-690.

[47] U. E. Kruya and Y. A. Lyakh, (1978) Acoustic emission in human bone tissue, Mekh. Polim., 1, 109-112.

[48] A. Peters, (1982) Acoustic emission technique and frac-ture healing, Med. Biol. Engng. Comput, 20, 8.

[49] T. M. Wright and J. M. Carr, (1983) Soft tissue attenua-tion of acoustic emission pulses, Trans. of ASME J. Biomech. Engg, 105(1), 21-23.

[50] S. Hanagud, G. T. Hannon, and R. G. Clinton, (1974) Acoustic emission and diagnosis of osteoporosis, Pro-ceedings of Ultrasonics Sym., 77-80.

[51] R. A. Thomas, H. S. Yoon, and J. L. Katz, (1977) Acous-tic emission from fresh bovine femora, Proceedings of Ultrasonics Symp. IEEE Cat.No. TICH12G4-ISU, 237-240.

[52] H. S. Yoon, B. R. Caraco, and J. L. Katz, (1980) Further studies on the acoustic emission of fresh animal bone, IEEE Trans. Sonics, Ultrasonic, SU-27, 160.

[53] P. Netz, (1979) The diaphyseal bone under torgue, Acta Orthop. Scand. Suppl., 176, 1-31.

[54] T. M. Wright, F. Booburgh, and A. H. Burstein, (1981) Permanent deformation of compact bone monitored by acoustic emission, J. Bomech, 14, 405-409.

21

S. Shrivastava1 ET AL.

Page 24: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

[55] R. A. Fischer, S. W. Arms, M. H. Pope, and D. Seligson, (1986) Analysis of the effect of using two different strain rates on the acoustic emission in bone, J. Biomech., 19(2), 119-127.

[56] P. J. Nicholls and E. Berg, (1981) Acoustic emission properties of callus, Med. Biol. Engg. Comput, 19, 416-418.

[57] S. Hanagud, R. G. Clinton, M. D. Chouinard, E. Berg, and P. J. Nicholls, (1977) Soft tissues and acoustic emis-sion based diagnostic tools, 1977 Ultrasonics Symposium Proceedings, IEEE CAT. No. 77CH1264-ISU, 242-245.

[58] M. Martens, R. van Audekercke, P. de Meester, J. C. Mulier, (1986) Mechanical behaviour of femoral bones in bending loading, Journal of Biomechanics, 19(6), 443-454.

[59] K. Ono, (1979) Fundamentals of acoustic emission, Los Angeles (CA), UCLA, 167-207.

[60] K. Stromsoe, A. Hoiseth, A. Alho, and W. L. Kok, (1995) Bending strength of the femur in relation to non-invasive bone mineral assessment, Journal of Biomechanics, 28(7), 857-861.

[61] B. C. Lentle, Diagnosis of osteoporosis using acoustic emission, The University of British Columbia, Patent no A61B8/08, 1999.

[62] J. G. Wells, (1985) Acoustic emission and mechanical properties of trabecular bone, Biomaterials, 6, 218-24.

[63] I. Leguerney, K. Raum, A. Saied, H. Follet, G. Boivin, and P. Laugier, (2003) Evaluation of human trabecular bone properties by scanning acoustic microscopy, J. Bone Min. Res., 18, S187.

[64] R. M. Rajachar, D. L. Chow, C. E. Curtis, N. A. Weiss-man, and D. H. Kohn, (1999) Use of acoustic emission to characterize focal and diffuse micro damage in bone, in: Acoustic Emissions: Standards and Technology Update, West Conshohocken, PA: American Society for Testing and Materials, 3-19.

[65] Y. Watanabe, S. Takai, Y. Arai, N. Yoshino, and Y. Hirasawa, (2001) Prediction of mechanical properties of healing fractures using acoustic emission, Journal of Or-thopedic Research, 19, 548-553.

[66] S. C. K. Kevin, H. Xiaolin, C. Y. C. Jack, and H. E. John, (2003) Acoustic transmission in normal human hips: Structural testing of joint symmetry, Medical Engineer-ing & Physics, 25(10), 811-816.

[67] S. Bumrerraj and J. L. Katz, (2001) Scanning acoustic microscopy study of human cortical and trabecular bone, Ann. Biomed. Engg, 29(12), 1034-1042.

[68] I. Eckardt and H. J. Hein, (2001) Quantitative measure-ments of the mechanical properties of human bone tis-sues by scanning acoustic microscopy, Ann. Biomed. Engg., 29(12), 1043-1047.

[69] Y. Xia, W. Lin, E. Mittra, B. Demes, B. Gruber, C. Rubin, and Y. Qin, (2003) Performance of a confocal acoustic mapping in characterization of trabecular bone quality in human calcaneus, J. Bone Min. Res, 18, S209.

[70] L. Cardoso, F. Teboul, L. Sedel, C. Oddou, and A. Me-unier, (2003) In vitro acoustic waves propagation inhu-man and bovine cancellous bone, J. Bone Min. Res, 18(10), 1803-1812.

[71] A. Ozan, (2005) Acoustic emission based surveillance system for prediction of stress fractures, Ph. D. Nicholas Wasserman, The University of Toledo Annual report.

[72] M. Browne, A. Roques, and A. Taylor, The acoustic emis-sion technique in orthopaedics-a review, J. Strain Analysis, 40(1).

[73] R. P. Franke, P. Dörner , H.-J. Schwalbe, and B. Ziegler, (2004) Acoustic emission measurement system for the orthopedical diagnostics of the human femur and knee joint, University of Ulm, Dept. of Biomaterials, Ulm, Germany, Bad Griessbach, Germany, University of Ap-plied Science Giessen, Giessen, Germany.

[74] A. Tatarinov, S. Noune, and S. Armen, (2005) Use of multiple acoustic wave modes for assessment of long bones: Model study, Journal of Ultrasonics, 43, 672-680.

[75] V. R. Singh, (1989) Acoustical imaging techniques for bone studies, Applied Acoustics, 27, 119-128.

[76] A. Alizad, M. Walch, J. F. Greenleaf, and M. Fatemi (2006) Vibrational characteristics of bone fracture and fracture repair: application to excised rat femur, Journal of Bio Medical Engineering, Trans ASME, 128.

[77] D. Bose, K. Bhalla, C. D. Untaroju, B. J. Ivarsson, and J. R. Crandall, (2008) Injury tolerance and moment re-sponse of knee joint to combined valgus brnding and shear loading, Journal of BioMedical Engineering, 130.

[78] E. K. Tschegg, S. Herndler, P. Weninger, M. Jamek, S. Stanzl- Tschegg, H. Redl, (2008) Stiffness analysis of tibia implant system cylical loading, Material Science and En-gineering C, 28, 1203-1208.

22

S. Shrivastava1 ET AL.

Page 25: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

Descriptively probabilistic relationship between mutated primary structure of von Hippel-Lindau protein and its clinical outcome

Shao-Min Yan , Guang Wu

ABSTRACT

In this study, we use the cross-impact analysis to build a descriptively probabilistic relationship between mutant von Hippel-Lindau protein and its clinical outcome after quantifying mutant von Hippel-Lindau proteins with the amino-acid distribution probability, then we use the Bayes-ian equation to determine the probability that the von Hippel-Lindau disease occurs under a mutation, and finally we attempt to distinguish the classifications of clinical outcomes as well as the endocrine and nonendocrine neoplasia induced by mutations of von Hippel-Lindau protein. The results show that a patient has 9/10 chance of being von Hippel-Lindau disease when a new mutation occurs in von Hippel- Lindau protein, the possible distinguishing of classifications of clinical outcomes using mod-eling, and the explanation of the endocrine and nonendocrine neoplasia in modeling view. Keywords: Amino Acid; Bayes’ Law; Cross-Impact Analysis; Distribution Probability; Mutation; Von Hippel-Lindau Disease

1. INTRODUCTION

Perhaps, the first step to study the genotype-phenotype relationship is to determine a protein in relation to a dis-ease, and the second step would be to build a quantitative relationship between mutant protein and its clinical out-come. Then we may be in the position to predict the clini-cal outcome based on such a quantitative relationship, even to predict new functions led by new mutations.

Thus, we need the methods, which can quantify a pro-tein sequence as a numeric sequence in order to build a quantitative relationship. In fact, we have various ways

to quantify a protein sequence, for example, to use the physicochemical property of amino acid to quantify a protein sequence [1].

Since 1999, we have developed three approaches to quantify each amino acid in a protein as well as a whole protein (for reviews, see [2,3,4]), and our quantifications indeed differ before and after mutation, thus it is possi-ble to use our approaches to build a quantitative rela-tionship between changed primary structure and changed function of protein.

In 1911 and 1926, von Hippel and Lindau described the von Hippel-Lindau disease [5,6], later on Melmon and Rosen established the notion of the von Hippel- Lindau disease [7], which is an autosomal dominant dis-order characterized by cerebellar, spinal cord, and retinal hemangioblastomas; cysts of the kidney, pancreas, liver, and epididymis; and has an increased frequency of renal cancer (renal cell carcinoma or hypernephroma), pan-creatic cancer, and pheochromocytoma [8,9,10]. The von Hippel-Lindau disease has a birth incidence of about 1 in 36000 and about 20% of cases arise as de novo muta-tions without a family history [11,12].

The von Hippel-Lindau disease tumor suppressor gene was identified in 1993 [13], of which mutations are the major cause for developing the von Hippel-Lindau dis-ease. Pathologically relevant is inactivation of the von Hippel-Lindau gene and subsequent loss of the function of the von Hippel-Lindau protein, and Elongin B, C complex [14,15]. The dysfunction of the ubiquitination of hypoxia-inducible factors is an important step in the development of various tumors [15,16,17,18,19]. Also, a recent study elucidated the role of NGF/JunB/ EglN3- related pathways in developmental apoptosis linking to tumourigenesis [20].

Clinically the von Hippel-Lindau disease is classified into two types: type I without pheochromocytoma and type II with pheochromocytoma [10,17]. On the other hand, more than 300 different von Hippel-Lindau muta-

Biosciences, 2009, 1, 23-32

Page 26: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

tions have been described at DNA level [21,22,23,24], and more than 100 at protein level. It would be great helpful if we can build a quantitative relationship be-tween von Hippel-Lindau protein mutation and von Hippel-Lindau disease status, that is, the relationship between mutant protein and its clinical outcome.

In this study, we build a descriptively quantitative rela-tionship between changed primary structure of mutated von Hippel-Lindau protein and the classification of its clinical outcome, distinguish the classifications of clinical outcomes as well as the endocrine and nonendocrine neo-plasia induced by mutations of von Hippel-Lindau protein.

2. MATERIALS AND METHODS

2.1. Data

The human von Hippel-Lindau disease tumor suppressor with total 132 mutations (accession number P40337; December 4, 2007; Entry version 91) is obtained from UniProtKB/Swiss-Prot entry [25]. Among them, 123 are missense point mutations, 7 deletions and 3 insertions.

2.2. Amino-Acid Distribution Probability

Among three approaches developed by us, the amino-acid distribution probability is mainly related to the positions of amino acids along the protein, which is suitable for mutation analysis, and we have used this approach in a number of our previous studies [2,3,4,26,27,28,29,30,31,32,33,34,35,36,37, 38,39,40,41,42,43,44]. The quantification is developed along such a thought, for example, how do two amino acids dis-tribute along a protein sequence? Our intuition may suggest that there would be one amino acid in the first half of the sequence and another one in the second half. In fact, there are only three possible distributions, 1) both amino acids are in the first half, 2) one amino acid is in each half and 3) both amino acids are in the second half. Thus, each distribution has the probability of 1/3. If we do not distinguish either the first half or second half but are simply interested in whether both amino acids are in both halves or in any half, there will be the probability of 1/2 for each distribution.

If we are interested in the distribution probability of three amino acids in a protein, we naturally imagine to grouping the protein into three partitions, and our intuition may suggest that each partition contains an amino acid. If we do not distinguish the first, second and third partition, actually there are totally three types of distributions, i.e. 1) each amino acid is in each partition, 2) two amino acids are in a partition and an amino acid is in another partition, and 3) three amino acids are in a partition.

In this situation, the distribution probability can be calculated according to the statistical mechanics, which classifies the distribution of elementary particles in en-ergy states according to three assumptions of whether distinguishing each particle and energy state, i.e. Max-well-Boltzmann, Fermi-Dirac and Bose-Einstein as-

sumptions [45]. We actually use the Maxwell-Boltzmann assumption for computing amino-acid distribution

probability, which is equal to !...!!

!

10 nqqq

r

r

n

nrrr

r !...!!

!

21

[45], where r is the number of amino

acids, n is the number of partitions, rn is the number of amino acids in the n-th partition, qn is the number of partitions with the same number of amino acids, and ! is the factorial function.

Thus, the distribution probabilities are different for these three types of distributions of three amino acids, say, 0.2222 for 1), 0.6667 for 2) and 0.1111 for 3). Clearly the protein can only adopt one type of distribu-tion for these three amino acids, which is the actual dis-tribution probability.

For four amino acids, there are five distributions, 1) each partition contains an amino acid, 2) a partition contains two amino acids and two partitions contain an amino acid each, 3) two partitions contain two amino acids each, 4) a partition contains an amino acid and a partition contains three amino acids, and 5) a partition contains four amino acids. Their distribution probabilities are 0.0938 for 1), 0.5625 for 2), 0.1406 for 3), 0.1875 for 4), and 0.0156 for 5). Furthermore, there are seven distributions for five amino acids, 11 distributions for six amino acids, 15 dis-tributions for seven amino acids, and so on.

2.3. Quantification of Wild-Type von Hippel- Lindau Protein

Table 1. Amino acids, their composition and distribution prob-ability in wild-type human von Hippel-Lindau protein. (A, alanine; R, arginine; N, asparagine; D, aspartic acid; C, cys-teine; E, glutamic acid; Q, glutamine; G, glycine; H, histidine; I, isoleucine; L, leucine; K, lysine; M, methionine; F, phenyla-lanine; P, proline; S, serine; T, threonine; W, tryptophan; Y, tyrosine; V, valine.)

Amino acid Number Distribution probability A 10 0.0476 R 20 0.0067 N 9 0.1770 D 11 0.1077 C 2 0.5000 E 30 0.0001 Q 8 0.0673 G 18 0.0389 H 5 0.0640 I 6 0.1543 L 20 0.0422 K 3 0.1111 M 3 0.6667 F 5 0.2880 P 19 0.0319 S 11 0.0404 T 7 0.2142 W 3 0.6667 Y 6 0.2315 V 17 0.1280

24

S. M. Yan ET AL.

Page 27: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

With respect to the wild-type von Hippel-Lindau protein, for example, there are eight glutamines “Q” in von Hip-pel-Lindau protein (Table 1). We may ask how these eight Qs distribute along the von Hippel-Lindau protein? According to the problem of the occupancy of subpopu-lations and partitions [45], the simple way to answer this question is to imagine that we would divide the von Hip-pel-Lindau protein into eight equal partitions, and each partition has about 27 amino acids (213/8=26.625) be-

cause the von Hippel-Lindau protein is composed of 213 amino acids, then there would be 22 configurations for all the possible distributions of eight Qs (Table 2).

Here, we calculate two distribution probabilities in Ta-ble 2 as example according to the above equation. For eight Qs equally distribute in each partition (the second row in Table 2), we have q0=0, q1=8, . . . q8=0; and r1=1, r2=1, . . . r8=1. Thus, we have the distribution probability,

0.002416777216

1

11111111

40320

1111111403201

40320

8!1!1!1!1!1!1!1!1

!8

!0!0!0!0!0!0!0!8!0

!8 8

Clearly, the von Hippel-Lindau protein can adopt only

one distribution pattern, which is that two partitions contain zero Q, five partitions contain one Q and one partition contains three Qs (the fourth row in Table 2).

So we have q0=2, q1=5, q2=0, q3=1, q4=0, q5=0, q6=0, q7=0, q8=0; and r1=0, r2=0, r3=1, r4=1, r5=1, r6=1, r7=1, r8=3, that is,

0.067316777216

1

61111111

40320

11111111202

40320

8!3!1!1!1!1!1!0!0

!8

!0!0!0!0!0!0!1!0!5!2

!8 8

In such a manner, we can quantify each amino acid in

wild-type von Hippel-Lindau protein. Thereafter, we can assign these probabilities to each amino acid in the von Hippel-Lindau protein as shown in Figure 1, from which we get the visual sense of how these distribution prob-abilities go along the von Hippel-Lindau protein, and more importantly we can sum up these distribution prob-abilities together for all 213 amino acids in the protein.

us a way to estimate the position of amino acid in a protein, because there is a standard method for the computation using Maxwell-Bolzmann assumption, which saves us from inventing new computational methods. Moreover, the primary structure is the base for higher- level structure, thus any mutation in primary structure would lead to the change in distribution probability, in higher-level structure, and finally the biological function. This is the biological mean-ing of use of Maxwell- Bolzmann assumption for quantify- Actually, the Maxwell-Bolzmann assumption provides

Table 2. All possible distributions of eight glutamines in von Hippel-Lindau protein. (Bold and italic is the real distribution.)

Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 Partition 6 Partition 7 Partition 8 Probability 1 1 1 1 1 1 1 1 0.002403 1 1 1 1 1 1 2 0.0673 1 1 1 1 1 3 0.0673 1 1 1 1 4 0.0280 1 1 1 5 5.6076e-3 1 1 6 5.6076e-4 1 7 2.6703e-5 8 4.7684e-7 1 1 1 1 2 2 0.2523 1 1 1 2 3 0.2243 1 1 2 4 0.0421 1 2 5 3.3646e-3 2 6 9.3460e-5 1 1 2 2 2 0.1682 1 2 2 3 0.0841 2 2 4 4.2057e-3 2 2 2 2 0.0105 1 1 3 3 0.0280 2 3 3 5.6076e-3 1 3 4 5.6076e-3 4 4 1.1683e-4 3 5 1.8692e-4

25

S. M. Yan ET AL.

Page 28: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

VHL protein position

0 21 43 64 85 107 128 149 170 192 213

Am

ino-

acid

dis

trib

utio

n pr

obab

ility

0.0

.2

.4

.6

.8

VHL protein position

Am

ino-

acid

dis

trib

utio

n pr

obab

ilit y

Figure 1. Visualization of amino-acid distribution probability in wild-type human von Hippel-Lindau protein.

cation of protein sequence. In this context, any clinical manifestations related to

mutation in proteins would have different distribution probabilities determined by Maxwell-Bolzmann as-sumption. This is the association between them.

2.4. Quantification of Mutated von Hippel-Lindau Proteins

The calculation in the above subsection is referred to the amino-acid distribution probability before mutation, say, the amino-acid distribution probability in wild-type von Hippel-Lindau protein. Obviously any point mutation leads an amino acid to change to another one, which certainly would change the distribution pattern of both

original and mutated amino acids, thus the amino-acid distribution probability would differ for both original and mutated amino acids between before and after muta-tion.

For example, the missense mutations at the CpG mu-tation hotspot at codon 167 can mutate arginine “R” to glycine “G”, or glutamine “Q” or tryptophan “W” [13, 46] leading to type I-II, type II and type II von Hippel- Lindau disease, respectively. In above subsection, we have calculated the distribution probability of Qs (Table 2) before mutation, and now we show the calculation of distribution probability after R167Q mutation.

After this mutation, there are nine Qs in the von Hip-pel-Lindau mutant (Table 3), for which we have

0.01979!3!0!3!1!0!2!0!0!0

!9

!0!0!0!0!0!0!2!1!1!5

!9 9

while its distribution probability before this mutation is 0.0673, so the mutation decreases the distribution prob-ability of Q. On the other hand, there are 20 and 19 Rs before and after this mutation. Their distribution prob-abilities are 0.0067 and 0.0030 before and after mutation, so this mutation decreases the distribution probability of R, too. The overall effect for this mutation is (0.0030–0.0067)+ (0.0197–0.0673)=–0.0513, that is, the mutation reduces the distribution probability for von Hippel-Lindau protein.

Since von Hippel-Lindau protein functions as whole, we can calculate the change led by the mutation in fol-lowing way. The sum of all the distribution probability is 19.6114 in wide-type von Hippel-Lindau protein (Figure 1), while the above calculated mutation leads the sum of

mutation results in 2.23% decrease in the measure [(19.1731–19.6114)/19.6114%].

In this way, we have the quan

all the distribution probability to be 19.1731, thus this

titative measure for the ch

elationship

anged primary structure of von Hippel-Lindau mutants and we also have documented clinical manifestations induced by the mutations of von Hippel-Lindau protein, thus we can build a quantitative relationship between changed structure and clinical outcome.

2.5. Descriptively Probabilistic RFor building quantitative relationship between mutation and clinical outcome, we use the descriptively probabil-istic method, as our quantification is the amino-acid dis-tribution probability and each individual mutation re-lated to its clinical outcome is presented as frequency. Therefore, we use the cross-impact analysis to couple

26

S. M. Yan ET AL.

Page 29: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

Partition I II III IV V VI VII VIII IX

Table 3. Distribution pattern of glutamines before and after mutation at position 167 in von Hippel-Lindau protein.

Befor e mutation 0 0 1 1 1 1 1 3 -

After mutation 0 0 0 2 0 1 3 0 3

em [35,47,48,49,50,51,52,53], because the amino-acid

tical

n is based on permutation, and can be

nted as mean±SD for normal distribu-

CUSSION

obability in

s on the re-la

thdistribution probability either increases or decreases af-ter mutation, which is a 2-possibilty event, and the clinical outcome either occurs or does not occur after mutation, which is a yes-and-no event. Thereafter, we can use the Bayesian equation to calculate the probabil-ity of occurrence of clinical outcome under a mutation.

2.6. Classification of Clinical Outcomes

It is extremely challenging how to use a mathemamodeling to distinguish the clinical outcomes with re-spect to mutant von Hippel-Lindau protein because of the variety of clinical outcomes. In an effort towards solving this problem, we employ our second quantifica-tion, amino-acid pair predictability, whose relational and applications have been published intensively (for re-views, see [2,3,4]).

This quantificatio calculated in the following way. For example, there

are 30 glutamic acids “E” and 20 Rs in von Hippel- Lindau protein, the predicted frequency of amino-acid pair ER would be 3 (30/21320/212212=2.817), while we do find three ERs in the protein, so the amino- acid pair ER is predictable. Still, the predicted frequency of EE would be 4 (30/21329/212212=4.085), but actually the EE appears nine times in reality. This is the case that the actual frequency is larger than its predicted one. In this manner, we can quantify a protein sequence according to the percentage of how many amino-acid pairs are predict-able among all the amino-acid pairs in given protein as well as its mutants. For instance, the predictable portion of amino-acid pairs is 27.54% in wild-type von Hip-pel-Lindau protein and 31.88% in its P25L mutant.

2.7. Statistics

The data are presetion or median with interquatile range for non-normal distribution. The Kruskal-Wallis one-way ANOVA and Chi-square are used for statistical inference, and P < 0.05 is considered significant.

3. RESULTS AND DIS

After computing amino-acid distribution prwild-type von Hippel-Lindau protein and in its 132 mu-tants, we have 132 changed amino-acid distribution probabilities. Firstly, we can use the cross-impact analy-sis to build a quantitative relationship between the in-crease/decrease of distribution probability after muta-

tions and the clinical diagnosis, because the cross-impact analysis is particularly suited for two relevant events coupled together [35,47,48,49,50,51,52,53].

Figure 2 displays the cross-impact analysitionship between changed primary structure and von

Hippel-Lindau disease. At the level of amino-acid dis-

tribution probability, P(2) and P are the decreased

and increased probabilities induc y mutations, and 53 and 79 mutations result in the distribution probability decreased and increased, respectively. At the level of

clinical diagnosis: 1)

2

ed b

2|1P is the impact probability

(conditional probabilit t the von Hippel-Lindau disease is diagnosed under the condition of increased distribution probability, and 70 mutations have such an

effect. 2)

y) tha

2|1P is the impact probability that other

disease is sed under the condition of increased distribution probability, and 9 mutations work in such a manner. 3) P(1|2) is the impact probability that the von Hippel-Lindau disease is diagnosed under the condition of decreased distribution probability, and 44 mutations

play such a role. 4)

diagno

2|1P is the impact probability

that other disease is d ed under the condition of decreased distribution probability, and 9 mutations fall into this category. At the level of combined events, we can see the combined results of changed structure and von Hippel-Lindau disease.

Table 4 lists the calculate

iagnos

d probabilities with respect to Figure 2, from which several interesting points can be

drawn. 1) As 2P is larger than P(2), a mutation has a

larger chance of i reasing the distribution probability in

von Hippel-Lindau mutant. 2) As

nc

P much lar-

ger than

2|1 is

2|1P , a mutation that in s the distribu-

tion probab as about nine tenth chance of being von Hippel-Lindau disease. 3) As P(1|2) is much larger than

crease

ility h

2|1P , a mutation that decreases the distribution

lity has much larger chance of being von Hippel- Lindau disease.

probabi

able 4. Computed probabilities in reference to the cross-im-Tpact analysis in Figure 2.

P(2)=53/132=0.4015

2P =1–P(2)=1–0.4015=0.5985=79/132

2 =70/79=0.8861 |1P

2|1P =1– 2|1P =1–0.8861=0.1139=9/79

P(1|2)= =0.83044/53 2

2|1P =1–P(1|2)=1–0.8302=0.1698=9/53

21P = 2|1P × 2P =70/79×79/132=0.5303=70/132

21P = 2|1P × 2P

P(12)= |2)×P =44

=9/79×79/132=0.0682=9/132

P(1 (2) /53×53/132=0.3333=44/132

21P = 2|1P ×P(2)=9/53×53/132=0.0682=9/132

27

S. M. Yan ET AL.

Page 30: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

MutationProbability

increases (n = 79)

Probability decreases (n = 53)

P(2)

P(2) = 1 -- P(2)

Distribution probability (event 2) Clinical diagnosis (event 1) Combined event

Other disease (n = 9)

VHL disease (n = 44)

VHL disease (n = 70)

Other diseas

P(12) = 70/132

e (n = 9)

P(12) = 9/132

P(1|2)

P(1|2)

P(1|2) = 1 -- P(1|2)

P(1|2) = 1 -- P(1|

P(12) = 44/132

P(12) = 9/132

2)

Figure 2. Cross-impact relationship among von Hippel-Lindau protein mutation, changed amino- acid distribution probability, and clinical diagnosis.

Secondly, w se the Bayes’ law

e u 2|1P

2

11|2

P

PP , wh indicates the probabilities o

nces of two ev [54], to determine the p(1), von Hippel ndau disease under a muause P(2) and have already been

ross- im while is the p

at the distribut robability con-ition that the vo ippel-Lindau disease is diagnosed. As P(1|2)=44/ 0.8302 (Table 4), and = 44/

4+70)=0.3860,

ich

ents-Li1P

ion pn H53=

f occur-

robability, tation, be-

defined in

robability

rePc 2|

pact analysis, 1|2Pc

th decreases under thed

1|2P

3860.0

.08302 4015.02

1|2

2|11

(4 P

P

P

0.8635, namely, the patient has nine tenth chance of eing von Hippel-Lindau disease when a new mutation found in von Hippel-Lindau protein. Among patients with von Hippel-Lindau disease,

bout 40% of mutations are genomic deletions and the st are predominantly truncating or missense mutations, hich do not occur within the first 53 amino acids 5,56]. In this study, we focus on the mutations of von ippel-Lindau protein. From a probabilistic viewpour results indicate the chance of being diagnos

von Hippel-Lindau disease when a new von Hippel-

Lindau mutant occurs. The von Hippel-Lindau disease is characterized by

marked phenotypic variability [57,58], due to mosaicism [59], modifier effects [60], and mainly allelic heteroge-neity [61]. All these result in complicated clinical classi-fications. Thus, we use the predictable portion of amino- acid pairs to model the classifications.

Figure 3 illustrates the classification with respect to the predictable portion of amino-acid pairs. Although there are large overlaps among classifications, our quan-tification already distinguishes them to some degree. For example, in comparison with von Hippel-Lindau disease, our quantification shows relatively lower in pheochro-mocytoma and higher in other disorders (P=0.079, Kruskal-Wallis one-way ANOVA). The lack of statistical significance is certainly, in part, due to few cases in some groups, however the trend is clear, which paves the way for further classification using more sophisticated mathematical models.

Genotype-phenotype relationships have revealed that a certain number of missense mutations are associated with a high risk of pheochromocytoma but the mutations

eir functions are associated with a low ts with type II von Hippel-Lindau dis-

ease have missense mutations whereas the large dele-

P

=bis

arew[5

int, ed as the

that totally loss thrisk. Most patien

Ho

28

S. M. Yan ET AL.

Page 31: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

pairs induced by mutations of von Hip-eo), von Hippel-Lindau disease and other with an interquatile range (P = 0.079,

Figure 3. Predictable portion of amino-apel-Lindau protein in pheochromocytoma disorders. The data are presented as medKruskal-Wallis one-way ANOVA).

cid (Phian

nendo-

1 6 11 16 21 27 32 37 42 47 52

Figure 4. Distribution of changed amino-acid dcrine neoplasia induced by mutations of von

istribution probability in endocrine and noHippel-Lindau protein (P = 0.094, Chi-square).

Cha

nged

am

ino-

acid

dis

trib

utio

n pr

obab

ility

(%

)

-10

0

10

20

Endocrine neoplasiaEndocrine neoplasia

Mutants in von Hippel-Lindau protein

1 7 13 19 25 32 38 44 50 56 62

Cha

nged

am

ino-

acid

dis

trib

utio

n pr

obab

ilit

y(%

)

10

20

Nonendocrine neoplasiaNonendocrine neoplasia

-20

-10

0

Mutants in von Hippel-Lindau protein

29

S. M. Yan ET AL.

Page 32: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

ype I

ribution probability (upper panel),

h might provide the much clearer pattern,

l viewpoint, one could consider t

is different. Without suc

t a theoretical study finds

ical assays. Our ap-

tions and truncating mutations predominate in tfamilies [11,19,62,63]. Many missense mutations caus-ing a type I phenotype are involved in the core hydro-phobic residues and were predicted to disrupt protein structure, whereas type II phenotype missense mutations

nvolved in substitutiare i ons at a surface amino acid that does not cause a total loss of function [64,65].

Figure 4 displays the distribution of changed amino- acid distribution probability in endocrine neoplasia (pheochromocytoma, type II von Hippel-Lindau disease) and nonendocrine neoplasia (type I von Hippel-Lindau disease). As can be seen, the mutations that led to the endocrine neoplasia have the trend to increase the amino-acid distwhereas the mutations that led to nonendocrine neopla-sia have the effect to either increase or decrease the amino-acid distribution probability (lower panel). The difference between two panels is mainly considered from view of symmetry. As the x-axis is related to the number of von Hippel-Lindau mutations, this figure would be different when more mutations would be found in future, whicalthough we did not find the statistical difference be-tween two panels (P=0.094, Chi-square) now.

From a theoretica o No.

calculate the distribution probability of all 19 potential types of mutations at each position of von Hippel-Lindau protein, and then find the link between mutations and clinical outcomes. However, the amount of computation is huge because it would be equal to 2.36910272 muta-tions (19213), which is not only beyond the capacity ofany computers, but also beyond the capacity for com-parison. Actually, we really know that each position does not have 19 types of potential mutations, because this mutation process is governed by the translation probabil-ity between RNA codon and mutated amino acids [66, 67,68]. On the other hand, our study is focused on the documented data rather than the simulated data.

In this study, we use a single value, the sum of all dis-tribution probability to represent the normal von Hippel- Lindau protein and its mutated proteins, respectively, because there is no other way to use a single value dy-namically to represent a protein, namely, the value is different when a protein h a measure, we cannot model a protein dynamically with its mutations. To the best of knowledge, currently it is only the accession number that can represent a protein uniquely, however it has nothing to do with the protein itself, i.e. composition, length, function, etc.

In general, one would hope to verify this type of study against the real-life cases, which is possible in future although it would deal with a large-scale collaboration because this type of diseases is not frequently seen in clinical settings, for example, the von Hippel-Lindau disease has a birth incidence of about 1 in 36 000 [11,12].

It will take years to verify whawith fast-speed computational technique. Even, we can-not verify all the theoretical studies, for example, we cannot create another earth without global warming.

The implications of this study include two aspects. 1) relationship between changed primary structThe ure and

changed function is very meaningful, because it provides the dynamic rather than static relationship between mu-tant protein and its function. This can furthermore pro-vide us the basis for building a dynamic model to predict the new function in mutant proteins. Nevertheless, we need to quantify the proteins in order to build a dynamic model and this study is doing in such a way. 2) From the clinical viewpoint, the classification of von Hippel- Lindau disease as well as many mutation related diseases needs a considerable amount of clinproach can provide a probabilistic estimate for disease classification after determining which amino acid has mutated, because the primary structure of protein is the base for its high-level structure and function.

4. ACKNOWLEDGEMENTS This study was partly supported by Guangxi Science Foundation (No. 0537012-G and 0991080), and Guangxi Academy of Sciences (project

9YJ17SW07). 0

REFERENCES

[1] K. C. Chou, (2004) Structure bioinformatics and its im-pact to biomedical science, Curr. Med. Chem, 11, 2105-2134.

[2] G. Wu and S. Yan, (2002) Randomness in the primary structure of protein: Methods and implications, Mol. Biol. Today, 3, 55-69.

[3] G. Wu and S. Yan, (2006) Mutation trend of hemaggluti-nin of influenza A virus: A review from computational mutation viewpoint, Acta Pharmacol. Sin., 27, 513-526.

[4] G. Wu and S. Yan, (2008) Lecture notes on computational mutation, Nova Science Publishers, New York, 2008.

[5] Von Hippel, (1911) Die anatomische Grund lage der von mir beschriebenen ‘sehr seltenen Erkrankung der Netz-haut’, Graefes. Arch. Ophthalmol., 79, 350-377.

[6] A. Lindau, (1926) Studien uber kleinhirncysten, bau, pathogenese und bezoejimgem zur angiomatosis retinae, Acta Pathol. Microbiol. Scand., Suppl 1, 1-128.

[7] K. L. Melmon and S. W. Rosen, (1964) Lindau’s disease, Am. J. Med., 36, 595-617.

[8] V. V. Michels, (1988) Investigative studies in von Hip-pel-Lindau disease, Neurofibromatosis, 1, 159-163.

[9] H. P. Neumann, (1987) Basic criteria for clinical diagno-sis and genetic counselling in von Hippel-Lindau syn-drome, Vasa, 16, 220-226.

[10] R. R. Lonser, G. M. Glenn, M. Walther, E. Y. Chew, S. K. Libutti, W. M. Linehan, and E. H. Oldfield, (2003) von Hippel-Lindau disease, Lancet, 361, 2059-2067.

[11] E. R. Maher, A. R. Webster, F. M. Richards, J. S. Green, P. A. Crossey, S. J. Payne, and A. T. Moore, (1996) Phe-notypic expression in von Hippel-Lindau disease: Corre-

30

S. M. Yan ET AL.

Page 33: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

Mol. Genet., 4,

.

ity by the von

Kaelin, (2002) Molecular basis of the VHL

er, and oxygen sensing, J. Am. Soc. Nephrol.,

gical basis, clinical criteria, genetic testing,

. Tory, I. Kuzmin, T. Stackhouse, F. Latif, W.

lations with phenotype, Hum. Mutat.,

mas, (1998) Germ-von Hip-

von Hippel-Lindau disease tumor suppressor gene, Hum. Mutat., 12, 417-423.

5] A. Bairoch and R. Apweiler, (2000) The SWISS-PROT protein sequence data bank and its supplement TrEMBL

cid pairs in human haemoglobin

alysis of presence

Analysis of distributions of

-198.

lations with germline VHL gene mutations, J. Med. Genet., 33, 328-332.

[12] F. M. Richards, S. J. Payne, B. Zbar, N. A. Affara, M. A. Ferguson-Smith, and E. R. Maher, (1995) Molecular analysis of de novo germline mutations in the von Hip-pel-Lindau disease gene, Hum. 2139-2143.

[13] F. Latif, K. Tory, J. Gnarra, M. Yao, F. M. Duh, M. L. Orcutt, et al., (1993) Identification of the von Hip-pel-Lindau disease tumor suppressor gene, Science, 260, 1317-1320

[14] P. O. Schnell, M. L. Ignacak, A. L. Bauer, J. B. Striet, W. R. Paulding, and M. F. Czyzyk-Krzeska, (2003) Regula-tion of tyrosine hydroxylase promoter activ [28]Hippel-Lindau tumor suppressor protein and hy-poxia-inducible transcription factors, J. Neurochem., 85, 483-491.

[15] W. G. Jr. hereditary cancer syndrome, Nat. Rev. Cancer, 2, 673-682.

[16] W. G. Jr. Kaelin, (2003) The von Hippel-Lindau gene, kidney canc14, 2703 -2711.

[17] T. Shuin, I. Yamasaki, K. Tamura, H. Okuda, M. Furihata, and S. Ashida, (2006) Von Hippel-Lindau disease: Mo-lecular patholo

and a

clinical features of tumors and treatment, Jpn. J. Clin. Oncol., 36, 337-343.

[18] M. Ohh, (2006) Ubiquitin pathway in VHL cancer syn-drome, Neoplasia, 8, 623-629.

[19] F. Chen, T. Kishida, M. Yao, T. Hustad, D. Glavac, M. Dean, J. R. Gnarra, M. L. Orcutt, F. M. Duh, G. Glenn, J. Green, Y. E. Hsia, J. Lamiell, H. Li, M. H. Wei, L. Schmidt, KM. Linehan, M. Lerman, and B. Zbar, (1995) Germline mutations in the von Hippel-Lindau disease tumor sup-pressor gene: Corre5, 66-75.

[20] S. Lee, E. Nakamura, H. Yang, W. Wei, M. S. Linggi, M. P. Sajan, R. V. Farese, R. S. Freeman, B. D. Carter, W. G. Jr. Kaelin, and S. Schlisio, (2005) Neuronal apoptosis linked to EglN3 prolyl hydroxylase and familial phaeo-chromocytoma genes: developmental culling and cancer. Cancer Cell, 8, 1-13.

[21] Clinical Research Group for VHL in Japan, (1995) Germline mutations in the von Hippel-Lindau disease (VHL) gene in Japanese VHL, Hum. Mol. Genet., 4, 2233-2237.

[22] H. P. Neumann, B. Bender, I. Zauner, D. P. Berger, C. Eng, H. Brauch, and B. Zbar, (1996) Monogenetic hy-pertension and pheochromocytoma, Am. J. Kidney Dis., 28, 329-333.

[23] S. Olschwang, S. Richard, C. Boisson, S. Giraud, P. Laurent- Puig, F. Resche, and G. Tholine mutation profile of the VHL gene in pel-Lindau disease and in sporadic hemangioblastoma, Hum. Mutat., 12, 424-430.

[24] C. Stolle, G. Glenn, B. Zbar, J. S. Humphrey, P. Choyke, M. Walther, S. Pack, K. Hurley, C. Andrey, R. Klausner, and W. M. Linehan, (1998) Improved detection of germ-

line mutations in the

[2

in 2000, Nucleic Acids Res., 28, 45-48. [26] N. Gao, S. Yan, and G. Wu, (2006) Pattern of positions

sensitive to mutations in human haemoglobin -chain, Protein Pept. Lett., 13, 101-107.

[27] G. Wu and S. Yan, (2000) Prediction of distributions of amino acids and amino a-chain and its seven variants causing-thalassemia from their occurrences according to the random mechanism, Comp. Haematol. Int, 10, 80-84.

G. Wu and S. Yan, (2001) Analysis of distributions of amino acids, amino acid pairs and triplets in human insulin precursor and four variants from their occur-rences according to the random mechanism, J. Bio-chem. Mol. Biol. Biophys., 5, 293-300.

[29] G. Wu and S. Yan, (2001) Analysis of distributions of amino acids and amino acid pairs in human tumor necro-sis factor precursor and its eight variants according to random mechanism, J. Mol. Model, 7, 318-323.

[30] G. Wu and S. Yan, (2002) Random anbsence of two-and three-amino-acid sequences and

distributions of amino acids, two- and three-amino-acid sequences in bovine p53 protein, Mol. Biol. Today, 3, 31-37.

[31] G. Wu and S. Yan, (2002)amino acids in the primary structure of apoptosis regula-tor Bcl-2 family according to the random mechanism, J. Biochem. Mol. Biol. Biophys, 6, 407-414.

[32] G. Wu and S. Yan, (2002) Analysis of distributions of amino acids in the primary structure of tumor suppressor p53 family according to the random mechanism, J. Mol. Model, 8, 191

[33] G. Wu and S. Yan, (2004) Determination of sensitive positions to mutations in human p53 protein, Biochem. Biophys. Res. Commun., 321, 313-319.

[34] G. Wu and S. Yan, (2005) Searching of main cause lead-ing to severe influenza A virus mutations and conse-quently to influenza pandemics/epidemics, Am. J. Infect. Dis., 1, 116-123.

[35] G. Wu and S. Yan, (2005) Prediction of mutation trend in hemagglutinins and neuraminidases from influenza A vi-ruses by means of cross-impact analysis, Biochem. Bio-phys. Res. Commun., 326, 475-482. G. Wu and S. Yan, (2006) Timing[36] of mutation in hemag-glutinins from influenza A virus by means of amino-acid distribution rank and fast Fourier transform, Protein Pept. Lett., 13, 143-148.

[37] G. Wu and S. Yan, (2006) Prediction of possible muta-tions in H5N1 hemagglutinins of influenza A virus by means of logistic regression, Comp. Clin. Pathol., 15, 255-261.

[38] G. Wu and S. Yan, (2006) Prediction of mutations in H5N1 hemagglutinins from influenza A virus, Protein Pept. Lett., 13, 971-976.

[39] G. Wu and S. Yan, (2007) Improvement of model for prediction of hemagglutinin mutations in H5N1 influenza

31

S. M. Yan ET AL.

Page 34: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

guishing of arginine, leucine and ser-ine, Protein Pept. Lett., 14, 191-196.

0] G. Wu and S. Yan, (2007) Improvement of prediction of mutation positions in H5N1 hemagglutinins of influenza A virus using neural network with distinguishing of ar-ginine, leucine and serine, Protein Pept. Lett., 14, 465-470.

1] G. Wu and S. Yan, (2007) Prediction of mutations engi-neered by randomness in H5N1 neuraminidases from in-fluenza A virus, Amino Acids, 34, 81-90.

2] G. Wu and S. Yan, (2007) Prediction of mutations in H1 neuraminidases from North America influenza A virus engineered by internal randomness, Mol. Divers., 11, 131-140.

[43] G. Wu and S. Yan, (2008) Prediction of mutations initi-ated by internal power in H3N2 hemagglutinins of influ-enza A virus from North America, Int. J. Pept. Res. Ther., 14, 41-51.

[44] G. Wu and S. Yan, (2008) Prediction of mutation in H3N2 hemagglutinins of influenza A virus from North America based on different datasets, Protein Pept. Lett., 15, 144-152.

[45] W. Feller, (1968) An introduction to probability theory and its applications, 3rd ed, Wiley, New York, 1, 34-40.

[46] B. Zbar, T. Kishida, F. Chen, L. Schmidt, E. R. Maher, F. M. Richards, P. A. Crossey, A. R. Webster, N. A. Affara, M. A. Ferguson-Smith, et al., (1996) Germline mutations in the Von Hippel-Lindau disease (VHL) gene in families from North America, Europe, and Japan, Hum. Mutat., 8, 348-357.

[47] T. G. Gordon and H. Hayward, (1968) Initial experiments with the cross-impact matrix method of forecasting, Fu-tures, 1, 100-116.

[48] T. G. Gordon, (1969) Cross-impact matrices - an illustra-tion of their use for policy analysis, Futures, 2, 527-531.

[49] S. Enzer, (1970) Delphi and cross-impact techniques: an effective combination for systematic futures analysis, Futures, 3, 48-61.

[50] S. Enzer, (1970) Cross-impact techniques in technology assessment, Futures, 4, 30-51.

[51] A. P. Sage, (1977) Methodology for large-scale systems, McGraw-Hill, New York, 165-203.

[52] G. Wu, (2000) Application of cross-impact analysis to the relationship between aldehyde dehydrogenase 2 and flushing, Alcohol Alcohol., 35, 55-59.

[53] G. Wu and S. Yan, (2008) Building quantitative relation-ship between changed sequence and changed oxygen af-finity in human hemoglobin-chain, Protein Pept. Lett., 15, 341-345.

[54] Wikipedia, (2008) Bayes’ theorem, http://en.wikipedia.org/wiki/ Bayes’_theorem.

[55] S. O. Ang, H. Chen, K. Hirota, V. R. Gordeuk, J. Jelinek, Y. Guan, E. Liu, A. I. Sergueeva, G. Y. Miasnikova, D. Mole, P. H. Maxwell, D. W. Stockton, G. L. Semenza, and J. T. Prchal., (2002) Disruption of oxygen homeosta-sis underlies congenital Chuvash polycythemia, Nature Genet., 32, 614-621.

[56] Y. Pastore, K. Jedlickova, Y. Guan, E. Liu, J. Fahner, H. Hasle, J. F. Prchal, and J. T. Prchal., (2003) Mutations of von Hippel- Lindau tumor-suppressor gene and congeni-tal polycythemia, Am. J. Hum. Genet., 73, 412-419.

[57] E. R. Maher, (2004) Von Hippel-Lindau disease, Curr. Mol. Med., 4, 833-842.

[58] E. R. Woodward and E. R. Maher, (2006) Von Hip-pel-Lindau disease and endocrine tumour susceptibility, End. Relat. Cancer, 13, 415-425.

[59] M. T. Sgambati, C. Stolle, P. L. Choyke, M. M. Walther, B. Zbar, W. M. Linehan, and G. M. Glenn, (2000) Mo-saicism in von Hippel-Lindau disease: lessons from kin-dreds with germline mutations identified in offspring with mosaic parents, Am. J. Hum. Genet., 66, 84-91.

[60] A. R. Webster, F. M. Richards, F. E. MacRonald, A. T. Moore, and E. R. Maher, (1998) An analysis of pheno-typic variation in the familial cancer syndrome von Hippel-Lindau disease: evidence for modifier effects, Am. J. Hum. Genet., 63, 1025-1035.

[61] P. A. Crossey, C. Eng, M. Ginalska-Malinowska, T. W. J. Lennard, J. R. Sampson, B. A. J. Ponder, and E. R. Maher, (1995) Molecular genetic diagnosis of von Hip-pel-Lindau disease in familial phaeochromocytoma, J. Med. Genet., 32, 885-886.

[62] P. A. Crossey, F. M. Richards, K. Foster, J. S. Green, A. Prowse, F. Latif, M. I. Lerman, B. Zbar, N. A. Affara, M. A. Ferguson-Smith, and R. Maher, (1994) Buys CHCM, identification of intragenic mutations in the von Hip-pel-Lindau disease tumour suppressor gene and correla-tion with disease phenotype, Hum. Mol. Genet., 3, 1303-1308.

[63] E. R. Maher, A. R. Webster, F. M. Richards, J. S. Green, P. A. Crossey, S. J. Payne, and A. T. Moore, (2000) Phe-notypic expression in von Hippel-Lindau disease: corre-lations with germline VHL gene mutations, J. Med. Genet., 37, 62-63.

[64] C. E. Stebbins, W. G. Jr. Kaelin, and N. P. Pavletich, (1999) Structure of the VHL-ElonginC-ElonginB com-plex: Implications for VHL tumor suppressor function, Science, 284, 455-461.

[65] S. J. Marx and W. F. Simonds, (2005) Hereditary hor-mone excess: Genes, molecular pathways, and syn-dromes, End. Rev., 26, 615-661.

[66] G. Wu and S. Yan, (2005) Determination of mutation trend in proteins by means of translation probability be-tween RNA codes and mutated amino acids, Biochem. Biophys. Res. Commun., 337, 692-700.

[67] G. Wu and S. Yan, (2006) Determination of mutation trend in hemagglutinins by means of translation prob-ability between RNA codons and mutated amino acids, Protein Pept. Lett., 13, 601-609.

[68] G. Wu and S. Yan, (2007) Translation probability be-tween RNA codons and translated amino acids, and its applications to protein mutations, in: Leading-Edge Messenger RNA Research Communications, ed. Os-trovskiy M. H. Nova Science Publishers, New York, Chapter 3, 47-65.

viruses with distin

[4

[4

[4

32

S. M. Yan ET AL.

Page 35: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

New blind estimation method of evoked potentials based on minimum dispersion criterion and frac-tional lower order statistics

Daifeng Zha

ABSTRACT

Evoked potentials (EPs) have been widely used to quantify neurological system properties. Tra-ditional EP analysis methods are developed under the condition that the background noises in EP are Gaussian distributed. Alpha stable distribution, a generalization of Gaussian, is better for modeling impulsive noises than Gaussian distribution in biomedical signal proc-essing. Conventional blind separation and es-timation method of evoked potentials is based on second order statistics or high order Statis-tics. Conventional blind separation and estima-tion method of evoked potentials is based on second order statistics (SOS). In this paper, we propose a new algorithm based on minimum dispersion criterion and fractional lower order statistics. The simulation experiments show that the proposed new algorithm is more robust than the conventional algorithm. Keywords: Evoked potentials (EPs), Alpha sta-ble distribution, Blind source separation, Mini-mum dispersion (MD), Fractional lower order statistics (FLOS) 1. INTRODUCTION

The brain evoked potentials (EPs) are electrical re-sponses of the central nervous system to sensory stimuli applied in a controlled manner. The EPs have a number of clinical applications including critical care, operating room monitoring and the diagnosis of a variety of neu-rological disorders [1, 2]. The analysis of EP characteris-tics is of special interest in many clinical applications, such as the diagnosis of possible brain injury and disor-ders in the CNS [11, 12]. Thus, the goal in the analysis of EPs is currently the estimation from the several poten-tials, or even from a single potential. In recent years, signal processing techniques including adaptive filtering, three-order correlation, and singular value decomposition (SVD) have been used in fast estimation of EPs. Inde-pendent component analysis (ICA) appeared as a prom-

ising technique in signal processing. Its main applica-tions re in feature extraction, blind source separation, biomedical signal processing. ICA is based on the fol-lowing principles. Assume that the original (or source) signals have been linearly mixed, and that these mixed signals are available. Conventional ICA is optimal in approximating the input data in the mean-square error sense, describing some second order characteristics of the data. Nonlinear ICA [3] method related to higher order statistical techniques is a useful extension of stan-dard ICA. The data are represented in an orthogonal ba-sis determined merely by the second-order statistics (co-variance) of the input data [4]. Recent studies [5, 6] show that alpha stable distributions is better for modeling im-pulsive noise, including underwater acoustic, low-frequency atmospheric, and impulsive EEG,ECG, than Gaussian distribution in signal processing. In gen-eral, EP signals are always accompanied by ongoing electroencephalogram (EEG) signals which are consid-ered noises in EP analysis. Often the EEG signals are assumed to be Gaussian distributed white noise for mathematical convenience. However, the EEG signals are found to be non-Gaussian in other studies (e.g., [9, 10]). Consequently, EP analysis algorithms developed under the Gaussian EEG assumption may fail or may not perform optimally. Developing EP analysis algorithms without the Gaussian distribution assumption for the background noise thus becomes a key to ensuring the reliability of the analysis results. There are two kinds of noises in the EP signals obtained. The first one is the background EEG noise found in all EP recordings. The second one is the noise introduced by the impact accel-eration experiment. An analysis shows that the alpha stable model fits the noises found in the impact accelera-tion experiment under study better than the Gaussian model [8].

The kind of alpha stable distribution process has no its second order or higher order statistics. It has no close form probability density function so that we can only describe it by its characteristic function:

( ) ( ) ( )[ ] αϖβγμϕ α,sgn1exp ttjttjt +−= (1)

W h e r e )1(2

tan),( ≠= ααπαϖ ift o r ,)1(log2

=απ

ift

Biosciences, 2009, 1, 33-39

Page 36: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

Figure 1. Evoked potentials (Left: without noises; Right: with noises).

-∞<ξ <∞, δ >0,0<α ≤2, -1≤β ≤1. The charac-

teristic exponent α determines the shape of the distribu-tion. Especially, if 2=α , it is a Gaussian distribution. The dispersion δ plays a role analogous to the variance of the second order process. β is the symmetry parame-

ter and μ is the location parameter. The distinct charac-

teristics of lower order stable process are its impulsive waveform and the thick tail in its distribution function. Due to the thick tails, lower order stable processes do not have finite second or higher-order moments. This feature may lead all second order moment based algorithms to fail or to function sub-optimally. The typical Evoked poten-tials are shown in Figure 1. 2. DATA MODEL

In the following, we present the basic data model used in both PCA and the source separation problem plotted in Figure 2, and discuss the necessary assumptions. We as-sume that P signals si(n), i=1,2,…P are non-coherent, sta-tistically independent. The noiseless linear ICA model with instantaneous mixing may be described by the equa-tion

∑=

==P

i

ii nsanAsnx1

)()()( (2)

where PMNnnxnxnxnx TM ≥== ,,...2,1,)](),....,(),([)( 21 (T

denoting the transpose) are observed signals, S(n)=[s1(n), s2(n),…,sP(n)]T are the source signals containing alpha stable distribution signals or noises which are supposed to be stationary and independent, and A is an unknown mix-ing matrix. Our goal is to estimate S from X, with appro-priate assumptions on the statistical properties of the source distributions. The solution is

WZ(n)Y(n)= (3) where W is called the de-mixing matrix, )(nZ is the whitening vector. The general ICA problem requires A to be an PN × matrix of full column rank, with

PM ≥ (i.e., there are at least as many mixtures as the number of independent sources). In this paper, we assume an equal number of sources and sensors to make calcula-tion simple. We can write the signal model in matrix form

as ASX = . Here X is observation data matrix, S is source signals data matrix, mixture matrix A is un-known.

3. WHITENING BY NORMALIZED COVARIANCE MATRIX

Generally, it is impossible to separate the possible noise in the input data from the source signals. In practice, noise smears the results in all the separation algorithms. If the amount of noise is considerable, the separation results are often fairly poor. Some of the noise can usually be filtered out using standard PCA if the number of mixtures is larger than the number of sources [13].

We introduce here a two-step separation method that achieves the BSS through minimization of a dispersion criterion. The first step is a whitening procedure that or-thogonalizes the mixture matrix. Here we search for a ma-trix B which transforms mixing matrix A into a unitary matrix. Classically, for a finite variance signal, the whit-ening matrix is computed as the inverse square root of the signal covariance matrix. In our case, impulsive EEG noises have infinite variances. However, we can take ad-vantage from the normalized covariance matrix.

Theorem 1 [7]:Let )](x),...,2(x),1(x[X N= be a stable process vectors data matrix, then normalized covariance matrix of X

)/(Γ

NXXTraceN

XXT

T

x⋅

= (4)

Converges asymptotically to the finite matrix when N→∞, i.e.,

),...,,(,Γlim 21 MT

xN

ddddiagDADA ==∞→

where 2

1

alim jj

P

ji

Nid ΔΔ= ∑

=∞→

, ja is column of

A, ∑=

=ΔN

nii Nnx

1

2 )( .

Theorem 2:We have eigen-decomposition of xΓ as T

x UU 2Γ Ω= and we can obtain whitening matrix TUB 1Ω= , then BXZ = is orthogonal.

34

D. F. Zha

Page 37: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

Figure 2. Data and system model.

Proof:

TTx

TTT /N)BTrace(XXNBΓBBXXZZ ⋅==

I/N)Trace(XXN

)(ΩIΩIΩ/N)Trace(XXN

)(ΩUUUΩUΩ/N)Trace(XXN

BUUΩB/N)Trace(XXN

T

T-12-1T

T-1T2T-1T

TT2T

⋅⋅=

⋅⋅⋅⋅⋅⋅⋅=

⋅⋅⋅⋅⋅=

⋅⋅⋅⋅=

So we can write

I))(BAD(BADBBADABBΓ T1/21/2TTTx === and thus the

whitening of x(n) is Bx(n)z(n) = .

4. DEMIXING ALGORITHM

The core part and most difficult task in BSS is learning of the separating matrix W. During the last years, many neu-ral blind separation algorithms have been proposed. In the following, we discuss and propose separation algorithms which are suitable for alpha stable noise environments in PCA-type networks.

Let us consider i th output weight vector PiWi ...2,1, = ,

standard PCA is based on second order statistics and maximize the output variances E|yi(n)|2=E|Wi

TZ(n)|2

subject to orthogonal constraints Pii IWW =T . As lower

order alpha stable distribution noise has no second order moment, we must select appropriate optimal criterion. FLOS and related other statistics are clearly defined in [5, 6]. So we must use fractional lower order statistics (FLOS) [5, 6], that is to say, the PCA problem corresponding to p-order moment maximization is solution to optimization problem. For each PiWi ...2,1, =

) (|W)(Z|1

maxarg T

WP

Tii

pi

opti IWWn

pEW

i

== (5)

Let objective function be

∑≠=

+−+=P

ijjjiijiiii

pii n

pEJ

,1

TTT WW2

1)1WW(

2

1|W)(Z|

1)W( λλ

Here the Lagrange multiplier is ijλ , imposed on the or-

thogonal constraints. For each neuron, iW is orthogonal

to the weight vector ijj ≠,W .

The estimated gradient of )( iJ W with respect to iW

is

∑=

− +=∇P

j

jijip

ii WWnconjWnnZEWJ1

T2T ))(Z(|)(Z|)()(ˆ λ (6)

At the optimum, the gradients must vanish for

Pi ,...2,1= , and ijjTiWW δ= .These can be taken into ac-

count by multiplying (6) by TjW from left. We can ob-

tain )W)(Z(|W)(Z|)(ZW T2TTi

pijij nconjnnE −−=λ . In-

serting these into (6), we can get

)W)(Z(|W)(Z|)(Z]WWI[)W(ˆ T2T

1

Ti

pi

P

jjji nconjnnEJ −

=∑−=∇

(7) A practical gradient algorithm for optimization problem

(5) is now obtained by inserting (7) into

))((ˆ)()()1( nJnnn iii WWW ∇−=+ μ ,

where )(nμ is the gain parameter. The final algorithm is thus

|)(Z])(W)(WI)[()(W)1(W1

T nnnnnnP

jjjii ∑

=

−−=+ μ

))(W)(Z(|)(W)(Z T2T nnconjnn ip

i− (8)

As )()()( T nnny ii WZ= , (8) can be written as follow

))((|)(|)()(W)1(W 2 nyconjnynnn ip

iii−−=+ μ

])(W)()(Z[1∑=

−P

jjj nnyn (9)

Let )(||)( 2 tconjttg p−= , then )(tg is appropriate PCA network nonlinear transform function for lower order al-pha stable distribution impulse noises.

Considering that during the iteration error item of gra-

dient ∑=

−P

j

Tjj nn

1

)()( WWI might be zero instantaneously,

we modify (9) in order to improve robustness of algorithm as

])())(()())[(()()()1(1∑=

−−=+P

jjiiii nnygnnygnnn WZWW μ (10)

Thus PW,...,W,W 21 can be obtained.

Let Y(n)=[y1(n), y2(n), ..., yP(n)]T, W=[W1, W2, …, WP]. For whole network, solution to W and optimization problem is

|W(n)Z|1

maxargW T

1W

optp

i

P

ip

E∑=

= (11)

According above derivation, by using

)(||)( 2 tconjttg p−= , the algorithm for learning W is thus

))(Y())](Y()(W)(Z)[()(W)1(W T ngngnnnnn −−=+ μ (12)

5. Performances Analysis

Different nonlinear function can be applied to different blind signal separation problem. Many popular functions are g(t)=sign(t) and g(t)=tanh(t) corresponding to thedou-

ble exponential distribution |)|exp(2

1x− and the in-

verse-cosine-hyperbolic distribution)cosh(

11

xπ, re-

specttively. For the class of symmetric normal inverse

35

D. F. Zha

Page 38: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

Figure 3. The nonlinear function of alpha stable distribution.

Gaussian (NIG), it is straightforward to obtain according

to [14])(

)()(

221

222

22 tK

tK

t

ttg

+

+

+

−=

δα

δα

δ

α, where K1(.) and

K2(.) are the modified Bessel function of the second kind with index 1 and 2.

For lower order alpha stable distribution noise has no second order or higher order moment, we must select appropriate nonlinear function g(t)=| t |p-2 conj(t) (p <α ). If t is real data, then g(t)=| t |p-1 sign(t). If p=1, then g(t)=sign(t). Figure 3 shows the nonlinear function of alpha stable distribution for differentα . We start from the learning rule (12), and we assume that

there exists a square separating matrix TH such that )()( T nn ZHU = . The separating matrix HT must be orthogonal. To make the analysis easier, we multiply

both sides of the learning rule (12) by TH . We obtain

))()(lg())]()(()(

)()[()()1(

nWnZnZnWgnWH

nZHnnWHnWHTTT

TTT

−+=+ μ (13)

For the sake of PIHH =T , we can get

))(WHH)(Z())](ZHH)(W()(WH

)(ZH)[()(WH)1(WHTTTTT

TTT

nngnngn

nnnn −+=+ μ (14)

Define )()( T nn WHQ = , )(()( T nn Q)HW -1= , (14) is written as

))(U)(Q())](U)(Q()(Q

)(U)[()(Q)1(QTT nngnngn

nnnn −+=+ μ (15)

Geometrically the transformation multiplying by the

orthogonal matrix TH simply means a rotation to a new set of coordinates such that the elements of the input vector expressed in these coordinates are statistically independent.

Analogous differential equation of (15) is obtained as matrix form:

)QU()UQ()QU(U/Q TTT ggEgEdtd −= (16)

According to [15], we can easily prove that (16) has stable solution. For the sake of Q=HTW, thus W= (HT)-1Q is asymptotic stable solution of (12). Figure 4 shows the stability and convergence of algorithm based on SOS and FLOS. From Figure 4, we know the algo-rithm based on FLOS has better stability and conver-gence than the algorithm based on SOS.

6. EXPERIMENTAL RESULTS

From Section I we know that the noise for EP could be a lower order stable process. Through computer simula-tions, we will demonstrate the effectiveness of the pro-posed algorithm under alpha stable noise conditions. We use correlation coefficient as follows to evaluate the per- formances of the proposed algorithms:

∑∑∑===

=N

nj

N

ni

N

njiji nynsnynsys

1

2

1

2

1

)()()()(),(τ (17)

Experiment 1

Two independent sources are linearly mixed. One is the periodical noise free EP signal, and the period is 128 points, the sampling frequency is 1000Hz. The other is an alpha stable non-Gaussian noise with 7.1=α . Two algorithms are used in the experiment, including: (1) SOS with nonlinear function )tanh()( ttg = ;(2) FLOS

with )(||)( 2 tconjttg p−= ,respectively. Figure 4 shows the stability and convergence of algorithm based on SOS and FLOS. We know the algorithm based on FLOS has better stability and convergence than the algorithm based on SOS.

We can get signals waveforms in time domain shown in Figure 5, where (a) and (b) are source signals, (c) and (d) are separated signals based on SOS, (e) and (f) are separated signals based on FLOS. For FLOS algorithm,

36

D. F. Zha

Page 39: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

Figure 4. The stability and convergence.

Figure 5. Separating results: (a)-(b) are the source signals. (c)-(d) are the separated signals with SOS. (e)-(f) are the separated signals with FLOS.

Figure 6. Separating results: (a)-(b) are the source signals. (c)-(d) are the separated signals with SOS. (e)-(f) are the separated signals with FLOS.

37

D. F. Zha

Page 40: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

Table 1. Comparison between the two algorithms.

Correlation coefficient (FLOS)

Correlation coefficient (SOS)

Iteration times

EP noise EP noise 50 0.1244 0.1044 0.0044 0.0004

100 -0.3450 -0.3050 -0.0050 -0.0063 150 0.4378 0.4378 0.1378 0.1072 200 0.6766 0.7706 0.1716 0.1212 250 -0.9291 -0.9091 -0.1711 -0.1451 300 -0.9287 -0.9107 -0.3937 -0.2231 350 -0.9293 -0.9113 -0.4993 -0.2923 400 0.9295 0.9195 0.3945 0.3045 450 0.9299 0.9292 0.2935 0.1935 500 -0.9501 -0.9593 -0.2804 -0.1904

Figure 7. The correlation coefficients of EP and noise.

the correlation coefficient between the separated and source EP signals is 0.9213, and the correlation coeffi-cient between the separated and source alpha stable non-Gaussian noises is –0.9098.

Experiment 2

We repeat simulations when GSNR is 20dB. Two inde-pendent sources are linearly mixed. One is the periodical noise free the brain evoked potential (EP) signal, and the period is 128 points, the sampling frequency is 1000Hz. The other is an alpha stable non-Gaussian noise with 7.1=α . Two algorithms are used in the experiment, including: (1) SOS with nonlinear function )tanh()( ttg = ;

(2) FLOS with )(||)( 2 tconjttg p−= , respectively. We can get signals in time domain shown in Figure 6, where (a) and (b) are source signals, (c) and (d) are separated sig-nals based on SOS,(e) and (f) are separated signals based on FLOS. For FLOS algorithm, the correlation coeffi-cient between the separated and source EP signals is–0.9213, and the correlation coefficient between the separated and source alpha stable non-Gaussian noises is –0.9098.

Experiment 3

Separate the mixed signals again with the new FLOS algorithm and conventional SOS algorithm, respectively. And the results of 10 independent experiments are shown

in Table 1. The correlation coefficients of EP and of the noise are calculated at some iteration times and plotted in Figure 7. From Table 1, we get that the performance of the new algorithm is better than the Conventional algo-rithm.

7. CONCLUSION

Alpha stable distributions, is better for modeling impul-sive noise than Gaussian distribution in biomedical sig-nal processing. Conventional blind separation and esti-mation method of evoked potentials is based on second order statistics. In this paper, we modify conventional algorithms and analyze the stability and convergence performance s of the new algorithm. From above simula-tion, we can easily obtain the following conclusions: the proposed class of algorithm of estimation of evoked po-tentials based on FLOS is more robust than conventional algorithms based on SOS so that its separation capability is greatly improved under both Gaussian and fractional lower order stable distribution noise environments. ACKNOLEDGEMENT This work is supported by the National Science Foundation of China under Grant 60772037 and the Science Foundation of Department of Health of Jiangxi province under Grant 20072048.

REFERENCES

38

D. F. Zha

Page 41: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

[1] R. R. Gharieb, A. Cichocki. (2001) Noise reduction in brain evoked potentials based on third-order correlations. IEEE Transactions on Biomedical Engineering, 48(5): 501-512.

[2] C. E. Davila, R. Srebro, and I. A. Ghaleb. (1998) Optimal detection of visual evoked potential. IEEE Transaction on Biomedical Engi-neering, 45(6): 800–803.

[3] Yanwu Zhang,Yuanliang Ma (1997) CGHA for principal component extraction in the complex domain. IEEE. Trans. on Neural Net-work,vol 8,No.5.

[4] Mutihac, R. Van Hulle, M.M. (2003) PCA and ICA neural imple-mentations for source separation - a comparative study. Proceedings of the International Joint Conference on Neural Networks, Volume: 1 , 20-24.

[5] C. L. Nikias and M. shao. (1995) Signal Processing with Al-pha-Stable Distributions and Applications. New York: John Wiley & Sons Inc.

[6] M. shao and C. L. Nikias. (1993) Signal Processing with fractional lower order moments: stable processes and their applications. Pro-ceedings of IEEE, Vol.81, No.7, 986-1010.

[7] G. Samorodnitsky, M. S. Taqqu. (1994) Stable Non-Gaussian Ran-dom Process: Stochastic Models with Infinite Variance. New York: Chapman and Hall.

[8]Xuan Kong,Tianshuang Qiu. (1999) Adaptive Estimation of Latency

Change in Evoked Potentials by Direct Least Mean p-Norm Time-Delay Estimation. IEEE Transactions on Biomedical Engi-neering, vol. 46, No. 8, August.

[9] N. Hazarika, A. C. Tsoi, and A. A. Sergejew. (1997) Nonlinear considerations in EEG signal classification. IEEE Trans. Signal Processing, vol. 45, pp. 829–936.

[10] X. Ma and C. L. Nikias. (1996) Joint estimation of time delay and frequency delay in impulsive noise using fractional lower-order sta-tistics. IEEE Trans. Signal Processing, vol. 44, pp. 2669–2687, Nov.

[11] X. Kong and N. V. Thakor. (1996) Adaptive estimation of latency changes in evoked potentials. IEEE Trans. Biomed. Eng., vol. 43, pp. 189–197, Feb.

[12] C. A. Vaz and N. V. Thakor. (1989) Adaptive Fourier estimation of time varying evoked potentials. IEEE Trans. Biomed. Eng., vol. 36, pp. 448–455, Apr.

[13] Winter, S.; Sawada, H.; Makino, S. (2003) Geometrical under-standing of the PCA subspace method for overdetermined blind source separation.Acoustics, Speech, and Signal Processing.

[14] Preben Kidmose. (2001) Blind separation of heavy tail signals. Technical university of denmark ,IMM-Phd, LYNGBY

[15] Juha Karhumen;Erkki Oja;Liuyue Wang;Ricardo Vigario;Jyrki Joutsensalo. (1997) A class of neural networks for independent component analysis. IEEE. Trans. on Neural Network,vol 8,No.3.

39

D. F. Zha

Page 42: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

Visualization of protein structure relationships us-ing constrained twin kernel embedding Yi Guo , Jun-Bin Gao , Paul Wing Hing Kwan & Kevin Xinsheng Hou

ABSTRACT

In this paper, a recently proposed dimensional-ity reduction method called Twin Kernel Em-bedding (TKE) [10] is applied in 2-dimensional visualization of protein structure relationships. By matching the similarity measures of the in-put and the embedding spaces expressed by their respective kernels, TKE ensures that both local and global proximity information are pre-served simultaneously. Experiments conducted on a subset of the Structural Classification Of Protein (SCOP) database confirmed the effec-tiveness of TKE in preserving the original rela-tionships among protein structures in the lower dimensional embedding according to their simi-larities. This result is expected to benefit sub-sequent analyses of protein structures and their functions. 1. INTRODUCTION

Recent years have seen phenomenal advances in dimen-sionality reduction (DR) methods that are widely applied in bioinformatics [26, 14, 6, 16], biometrics [19, 4, 15, 17], robotics [11], etc. The rapid growth of DR methods stems from the need to reduce the complexity of the problem at hand. The target of these methods is mainly to find the corresponding counterparts of the input data in a much lower dimensional space without incurring significant information loss. The low dimensional repre-sentation can be used in subsequent procedures such as classification, pattern recognition, and so on. For exam-ple, a medium sized protein structure typically has a few thousands of degrees of freedom which is naturally re-siding in very high dimensional space. It causes the so-called “curse of dimensionality” problem which will not only drastically increase the computational complexity of the learning algorithms but also require large storage space, leading to very slow indexing and searching speed in a large scale database in the sequel. Through DR, most of the redundant dimensions can be removed and

the degrees of freedom left are then input to the subse-quent discriminant and classification tasks with consid-erable simplication.

Another advantage of using DR is the 2- or 3-dimensional mappings of the original data can be visual-ized in an Euclidean space that can facilitate interpreting the relationships among data by the researchers. Nor-mally, the relationships among a set of protein structures are typically represented in the form of trees derived by hierarchical clustering. However, this representation only provides some hints on the evolutionary distances between protein structures. This limitation motivates our applying to the application of DR methods in visualizing the similarity relationships among protein structures.

In this paper, we will propose a new DR algorithm called Constrained Twin Kernel Embedding (CTKE) based on Twin Kernel Embedding (TKE) [10] to achieve the above target. To provide the necessary background knowledge, we will first give a brief review of DR meth-ods on protein structures in the next section, followed by an introduction of TKE and related topics involved in this new algorithm. Then the CTKE algorithm will be introduced, which integrates the similarity matching by TKE with the objective function used in LS-SVM [23]. Experiments on real protein structure data will be pre-sented and finally we summarize this paper with a con-clusion.

2. THE RELATED WORKS

DR methods can be categorized into Linear DR methods (LDR) such as Principal Component Analysis (PCA) [13], Linear Discriminant Analysis (LDA) [7] and NonLinear DR methods (NLDR) such as ISOMAP [25], Laplacian Eigenmaps (LE) [3], Locally Linear Embed-ding (LLE) [20], etc. LDR methods have been widely used in bioinformatics due to their simplicity. For exam-ple, Teodoro et al. [27] applied PCA to transform the original high dimensional protein motion data into a lower dimensional representation that captures the domi-nant modes of motions of the protein. However, the line-arity assumption on which linear methods are con-structed does not hold in most cases. In [5], Das et al.

Biosciences, 2009, 1, 40-47

Page 43: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

successfully projected the folding free-energy on a few relevant coordinates by using a typical nonlinear method, ISOMAP, to correctly identify the transition-state en-semble of the reaction based on the fact that empirical reaction coordinates routinely used in protein folding studies cannot be reduced to a linear combination of the Cartesian coordinates.

Because of the power of the NLDR methods, they have also been applied to visualizing the relationships between protein structures. Hanke and Reich [12] em-ployed the Kohonen self organizing maps, a special form of neural networks as a visualization tool for the analy-ses of protein structure similarity by converting the se-quences into a characteristic signal matrix. In [1], by using the pairwise similarity index between two se-quences, Sammon maps projected the sequences onto a display plane in such a way that the Euclidean distances between the images approximate as closely as possible the corresponding values in the original sequence space. The metric used to measure the similarity between two protein structures was based on the individual residue similarities derived from a series of amino acid exchange matrices. Further, a modified nonlinear Sammon projec-tion was developed in [2] to display the relationships among protein structures based on their amino acid com-position.

Recently, a new method called Stochastic Proximity Preserving (SPE) was introduced into this field by Far-num et al. in [6]. SPE preserves only the local relation-ships among closely related sequences to avoid a draw-back of those global methods such as MDS that underes-timates the proximity of sequences since all pairwise distances are included in the algorithm, leading to erro-neous results. To emphasize the proximity, SPE first applies a neighborhood filtering procedure to the similar-ity matrix (the similarity metric used in SPE is identical to that used in [1]). Then the filtered similarity matrix is input into the Sammon’s nonlinear mapping to obtain the final result.

From the discussion above, we can clearly see that there are three important components in DR: dis/similarity metric for the input data (protein structures in this paper), the objective function (the core of the al-gorithm) and the dis/similarity metric for images (the corresponding low dimensional representation of the protein structures) which is usually the Euclidean dis-tance. These DR methods are trying to preserve the simi-larity metric of the protein structures as much as possible and reproduce it in a human interpretable space. The dis/similarity metrics used in those methods mentioned above are based on proteins and their structure-related evaluators while the objective functions are from Sammon’s mapping.

In computational biology, the similarity metric known as kernel functions that are both powerful and promising is gaining much attention. An important advantage of a kernel function is that the form of the input data does not

have to be vectorial. Any structured data like protein structures can be properly processed by specially de-signed kernel functions. This advantage avoids the in-formation loss during vectorization. TKE is constructed on the basis of this kind of similarity metric. The objec-tive function is totally different from Sammon’s mapping and furthermore the dis/similarity metric for images is not limited to simple Euclidean distance, but a kernel function to capture the nonlinearity.

3. TWIN KERNEL EMBEDDING

Without loss of generality, the following notations will be adopted. The data in the input space y are denoted

by iy ( , ,1i N= … ) while ix ( , ,1i N= … ) their embed-

dings in a low dimensional space or the so-called latent space X . Notice that here the iy will be the protein

structures. The term “embeddings” is from the manifold learning literature which means the images of the input data or equivalently the embedded data. In addition, Y and X will be used to denote respectively the set of input objects and the set of embedded objects. If the ob-jects were vectorial, Y (and X ) would denote a matrix consisting of rows of vectors. Furthermore, ·a b denotes the inner product of two vectors a and b .

The Twin Kernel Embedding (TKE) preserves the similarity structure of input data in the latent space by matching the similarity relations represented by two ker-nel Gram matrices, i.e. one for the input data and the other for their embeddings by simply minimizing the objective function

xy VecKVecK ⋅− , (1)

where Vec is the vec operator on matrix (to stack all the columns of the matrix to make a long vector) and

yK and xK are the kernel Gram matrices derived from

valid Mercer kernel functions ( , )yk ⋅ ⋅ and ( , )xk ⋅ ⋅ [21]

defined on the input data and embeddings respectively. The idea is to preserve the similarities among the input data and reproduce them in the lower dimensional latent space expressed again in similarities among embeddings. To make this point clearer, we can simply regard

y x- ecK VecV K as a linear kernel (liner kernel is defined

as ( , )k a b a b= ⋅ ) which is a measure of similarity of the

variables involved in the kernel function. The larger the value of the kernel, the more similar these two variables are. As a result, we minimize (1) to make xK and yK as

similar as possible. To avoid any trivial solutions to (1), two regulariza-

tion terms on the kernel and embeddings are introduced and the objective function in (1) becomes

yT

x k x x xL = -tr(K K )+ l tr(K K )+ l tr(XX ) (2)

where we use the fact that y x y x×Vec = tr(VecK K K K ) .

The second term is a ridge regularizeron the kernel to make sure that the norm of the kernel is controlled. This

41

Y. Guo ET AL.

Page 44: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

can avoid solutions that simply let the elements in xK go

to infinity. The third term imposes a heavy penalty on too large a norm for the embeddings which ensure that their coordinates are relatively small. kλ and xλ are tun-

able parameters to control the strength of the regulariza-tion and are assumed to be positive.

In order to capture the nonlinear structure, ( , )xk ⋅ ⋅ should be chosen to be nonlinear. Normally, we

use the RBF kernel 2

i jx i j

x - xk (x ,x ) = gexp(-s )

2

‖ ‖ (3)

where γ and σ are positive hyperparameters. Nonethe-

less, other kernels can also be applied here. The selection of RBF kernel is for its connection to the Euclidean dis-tance which has a geometric understanding of the em-beddings. There is no closed form solution for X and hence an optimization procedure like gradient descent based algorithm for optimization should be employed provided that ( , )xk ⋅ ⋅ is differentiable. The initialization

of X is also required to start the optimization process. KLE [9], KPCA [22] and other methods that can work with kernels could be utilized here for initializing X . The dimension of the embeddings which is normally 2 is assigned according to the application. A by-product of this optimization process is that we can get the optimal hyper-parameters (such as γ and σ if RBF kernel is used)

of the kernel function ( , )xk ⋅ ⋅ as well. It ensures that the

kernel we pick up is well adjusted. TKE is designed to preserve locality and non-locality

at the same time. This is done by the filtering of the en-tries in yK . Not all the entries remain the same in the

optimization process but those that convey most of the similarity information of the input data. This filtering process is fulfilled by performing k-nearest neighbor selecting procedure on yK . Given an object iy in the

input space, only those objects whose similarities (in the sense of kernel values) to iy are in the k nearest

neighbors that are selected to retain their original values while all others are set to 0. The variable k (>1) in k-nearest neighboring controls the neighborhood that the algorithm will preserve. It can be interpreted as con-structing a weighted adjacency graph which performs in feature space and the weights on edges are evaluated by a kernel as that in KLE. Similar procedure is involved in SPE as well which uses the neighborhood radius to en-sure the proximity preserving property. Because TKE tries to match xK to yK and RBF kernel (3) cannot give

a value of 0 except that two points in the latent space are very far apart, TKE seeks a solution that keeps the points in the same neighborhood close while makes the points not in the same neighborhood be very far. However, TKE also works without filtering in which case TKE will preserve all pairwise similarities and will become a global approach simply.

In addition to the fact that TKE outperforms other methods such as KPCA, KLE etc, an elegant feature of TKE is that it uses only the pairwise similarities since in its objective function, only the kernel Gram matrix of the input data is required. Through TKE, any kind of data can be visualized in lower dimensional space as long as an appropriate kernel is defined for them. As a result, a

kernel Gram matrix on the input data yK and an initiali-

zation for X will be adequate for TKE to find the opti-mal embeddings.

4. CONSTRAINED TWIN KERNEL EM-BEDDING

It is clear from observing (2) that there is no explicit connections between input data and their corresponding embeddings in TKE except the similarity preserving. As such, the information hidden in the input data is ne-glected. For example, if the input data actually stay on or near to a smooth manifold embedded in the ambient space, the location of the input data can be explored to predict the coordinates of the embeddings on the mani-fold. Furthermore, TKE can only find the optimal em-beddings for currently presented data as we can see from its objective function. To address these problems, we introduce the constraints reflecting the relationship be-tween input data and embeddings into TKE by following steps. We first define a mapping function :f y x→ and

then incorporate it into the objective functional of TKE as regularization terms. Finally the optimal embeddings X and the f will be searched via conjugate gradients algorithm.

We can start from the kernel feature mapping directly and incorporate it as the core part of the LS-SVM (the dual form of the objective function of LS-SVM) into TKE as what has been done in [24]. We minimize the following objective function similar to that of LS-SVM with equality constraints

d2

j j ijj=1 i

T

j

u hJ = w w + e

2 2∑ ∑ (4)

. ( )Tij j j i ijs.t x y eω ϕ= + (5)

corresponding to the maximum margin (the first term) and least square errors (the second term) in (4) where υ and η are adjustable parameters. jϕ ’s are the feature

mappings which map the input iy into Hilbert space

where the inner product is defined. jω is a column vec-

tor having the same dimension as the Hilbert space. We can regard the ix in (5) as the projection of ( )iyϕ onto

a subspace parameterized by jω ’s. Because the differ-

ence of the constraints, it does not have the same geo-metrical interpretation of LS-SVM because the original constraints are inequalities reflecting the correct classifi-cation hyperplanes while here they are just feature map-

42

Y. Guo ET AL.

Page 45: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

ping which builds the relation between input data iy and

its counterpart ix in low dimensional space. In (4), we

are minimizing the error ije in reconstruction of ix es-

sentially however it should be done with ijω properly

constrained. This happens to have the form of the LS-SVM objective function. To solve (4) with the equality constraints (5), we can use Lagrange multipliers as

dT 2

j j ij ij ij j j i ijj=1 ij ij

Tu hL= w w + e + a (x - w f (y )- e )

2 2∑ ∑ ∑ (6)

From the saddle points, 0j

Lω∂∂

= , 0ij

Le∂

=∂

we have

1( ) 0 ( )

.1

0

j ij j i j ij j ij i i

ij ij ij ijij

L

Le e

e

υω α ϕ ω α ϕω υ

η α αη

∂= − = ⇒ =

∂= − = ⇒ =

∑ ∑y y

Substitute them back into (6) and eliminate jω and ije

we have the dual problem to be maximized according to the min-max duality

d2

ij j i ij j i ijj=1 i i ij

ij ij ij j i j i ijij i

2j j j ij ij

j ij ij

T

T

i

T

j

u 1 1 h 1L = ( a f (y )) ( a f (y ))+ ( a )

2 u u 2 h

1 1+ a x - ( a f (y )) f (y ) - a

u h

1 1= - a K a - a + a x

2u 2h

∑ ∑ ∑ ∑

∑ ∑

∑ ∑ ∑

(7)

where 1T

2( , , )j j jα α α= … , ( ) ( ) ( )m i jT

m iy y k y , yϕ ϕ =

to which the “kernel trick” applies. ( , )jk ⋅ ⋅ is the kernel

associated with the kernel mapping and jK is the Gram

matrix derived from ( , )jk ⋅ ⋅ accordingly. We then maxi-

mize the above dual problem with respect to ijα instead

of jω and ije . From the discussion above we see that

the ijx ’s are free variables. To limit the choice of them,

we combine (7) with the objective functional of TKE to incorporate the similarity preserving as

2y i j x i j k x i j x i i

i, j ij i

2j j ij ij

T

Tij

j ij ij

j

L = - k (y ,y )k (x ,x )+l k (x ,x ) +l x x

1 1+ a K a + a - a x

2u 2h

∑ ∑ ∑

∑ ∑ ∑ (8)

where we turn the maximization of (7) to minimization aligned with the TKE objective functional. Here we see that the terms related to iy are expressed by kernel

( , )yk ⋅ ⋅ . Therefore, this revised objective function is still

non-vectorial data applicable. Again we express (8) into matrix form to facilitate the differentiation and let

( , ) ( , )yjk k⋅ ⋅ = ⋅ ⋅ for simplicity T

T T

2x y k x x

yT

L = -tr[K K + l K + l XX

1 1+ tr[A K A] + tr[A A] - tr[A X]

2u 2

] tr[ ] t [

h

r ] (9)

where 11 1d

N1 Nd

a

A = M M

a

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

Hence we minimize L in (8) with respect to A , X and kernel hyperparameters of ( , )xk ⋅ ⋅ . If we substitute the

saddle point solution back into the equality constraints (5), the following mapping function is handy to predict new input samples

ij mj y m i ijm

1 1x = a k (y , y )+ a

u h∑ (10)

The errors 1

ijαη

can be neglected since the values of the

errors are very small compared with X after optimiza-tion.

To apply the conjugate gradient algorithm, the deriva-tives of L with respect to X and A are given by

,and, 2xk x y

x x

L L K LA K - K

X K X Kλ∂ ∂ ∂ ∂

= − =∂ ∂ ∂ ∂

(11)

1 1y

LK A A - X

A υ η∂

= +∂

(12)

X is still initialized by KPCA or KLE to obtain pure non-vectorial data applicability. Initial A is from the

solution of 0LA∂

=∂

after X is known. We have

1 1( -1

yA K I) Xυ η

= + . It implies that we could alternately

update X and A in optimization, however we still use conjugate gradient algorithm to update them at the same time. Because the mapping function defined between input space and latent space acts as constraints, we call this algorithm Constrained TKE (CTKE). It is noticeable that in CTKE, we use yK to construct the mapping

function, so we do not have to filter yK before com-

mencing the algorithm.

5. EXPERIMENTAL RESULTS

Experiments were conducted on the SCOP (Structural Classification Of Protein). This database is available at http://scop.mrc-lmb.cam.ac.uk/scop/. The database comes with a detailed and comprehensive description of the structural and evolutionary relationships of the pro-teins of known structure. 292 protein structures from different protein superfamilies and families are extracted for the test. The kernels we used are from the family of the so-called alignment kernels whose thorough analyses can be found in [18]. The corresponding kernel Gram matrices are available on the website of the paper as supplements and were used directly in our experiments.

5.1 Parameters Setting

Both CTKE and TKE have parameters to be determined beforehand. Through empirical analyses (performed a set

43

Y. Guo ET AL.

Page 46: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

of experiments on the same data set varying only the parameters), we found these algorithms are not sensitive to the choice of the parameters, so long as the conjugate gradient optimization can be carried out without prema-ture early stop. So we use the following parameters in the 10k = in k nearest neighboring; for CTKE,

0.05kλ = , experiments. For TKE, 0.005kλ = ,

0.001xλ = and 0.01xλ = , 0.5υ η= = . The minimize

tion will stop after 1000 iterations or when consecutive

update of the objective function is less than 710- . ( , )xk ⋅ ⋅

is the RBF kernel and initialization is done by KPCA for both CTKE and TKE.

(a) CTKE with MAMMOTH kernel

(b) TKE with MAMMOTH kernel

Figure 1. Visualization results of protein structures.

44

Y. Guo ET AL.

Page 47: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

5.2 Proteins on 2-D Plane

The proteins from the same families are expected to be close in the 2-dimensional space with overlappings indi-cating similar proteins but actually from different fami-lies. The results are plotted in Figure 1. The results of other methods (kernel applicable methods) are also pre sented for comparison in Figure 2. Each point (denoted as a shape in the figures) represents a protein. The same shapes with the same colors are the proteins from same families while the same shapes with different colors rep-resent the proteins from different families but of thesame superfamilies. All the figures in this paper share the same legends as those in Figure 1 (b).

Both TKE and CTKE reveal the fact that proteins from the same families congregate together as clusters while KPCA and KLE fail to reveal it. For example, in Figure 1 (a) and (b), almost all of the proteins from the globin family gathered at the bottom left corner indicat-ing similar structures. Interestingly, CTKE and TKE also

reveals the fact that the proteins from the same super-family but different families are similar in structure, which is reflected in the 2-dimensional plane that the corresponding groups (families) are close if they are in the same superfamily. For instance, note that the proteins from ferritin and ribonucleotide reductase-like families (blue diamonds and violet diamond respectively in the figure), they are close in the group level. SPE has com-parable performance visually. But the overlapping in the middle shows that it cannot distinguish some families clearly.

In order to further quantify the results, we use the “pu-rity” [8] to evaluate some of these methods. It uses the fraction of the number of samples from the same class as given point in a neighborhood with size n. The purity is the average of the fraction over all points. The higher the purity, the better the quality of the clusters. Intuitively, this standard provides an objective judgement from the classification point of view and the method with better

(a) KPCA (b) KLE (k = 50)

(c) SPE (global) (d) SPE (local k = 100)

Figure 2. The result of other methods with MAMMOTH kernel. Like TKE, SPE can work globally and locally depending on whether the kernel Gram matrix is filtered first by k-nearest neighboring. SPE global worked with the whole kernel Gram matrix and SPE local filtered Ky with 100-nearest neighboring in this experiment

45

Y. Guo ET AL.

Page 48: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

Figure 3. Comparison of different methods using purity.

purity has the potential to achieve better classification rate. It is noticeable that for SPE and KLE with parame-ter k, we did multiple experiments to choose the optimal value of k corresponding to the largest purity when the size of neighborhood is 1. We found that for SPE the larger the k, the better the result. Specifically, when k equals the number of the data, i.e. no filtering at all, SPE turns out to be a nonlinear MDS. As we can see from Figure 3, CTKE and TKE have higher purity than others. The average purity of CTKE is 0.5946 and TKE 0.6135. It shows that CTKE has very close performance to TKE. However, the advantage of the CTKE is its ability to predict novel samples because of the mapping function defined explicitly between input space and latent space. This mechanism broadens the applicability of this algo-rithm greatly to classification, identification etc. It also provides a tool to explore the manifold formed by data.

6 CONCLUSIONS

In this paper, we visualized the similarity relationships among protein structures using constrained Twin Kernel Embedding (CTKE) which is constructed on the original TKE [10]. It has comparable performance to that of TKE but possesses the ability to predict the embeddings for novel samples. Because CTKE implements a mapping function from input space to latent space, the information among input data is further exploited. CTKE also has the similarity preserving property as TKE does and is purely non-vectorial data applicable since the mapping function comes from the feature mapping and expressed as kernel function eventually. Moreover, there is no k-nearest neighbor filtering in CTKE and hence avoid choosing another parameter which is common across other DR algorithms. From the experiments on proteins, we have seen that CTKE preserves the similarity relationships

among the protein structures and reproduces them in a much lower dimensional space, allowing easy interpreta-tion by researchers. This algorithm is promising as it can be further applied to the study of the evolution of protein structure and the prediction of proteins functions.

ACKNOWLEDGEMENTS This work is supported by the National Science Foundation of China (NSFC 60373090), the ARC DP Development Grant from Charles Sturt University and the University Research Grant from the University of New England.

REFERENCES

[1] Dimitris K. Agrafiotis. (1997) A new method for analyzing protein sequence relationships based on sammon maps. Protein Science, 6(2):287–293.

[2] Izydor Apostol and Wojciech Szpankowski. (1999) Indexing and mapping of proteins using a modified nonlinear sammon projection. Journal of Computational Chemistry, 20 (10):1049– 1059.

[3] Mikhail Belkin and Partha Niyogi. (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computa-tion, 15(6):1373–1396.

[4] Samarasena Buchala, Neil Davey, Ray J. Frank, and Tim M. Gale. (2004) Dimensionality reduction of face images for gender classifi-cation. In Proceedings of 2nd International IEEE Conference on In-telligent Systems, volume 1, pages 88–93.

[5] Payel Das, Mark Moll, Hern´an Stamati, Lydia E. Kavraki, and Cecilia Clementi. (2006) Lowdimensional free energy landscapes of protein folding reactions by nonlinear dimensionality reduction. In Proceedings of the National Academy of Sciences, volume 103, pages 9885– 9890, USA .

[6] Michael A. Farnum, Huafeng Xu, and Dimitris K. Agrafiotis. (2003) Exploring the nonlinear geometry of protein homology. Protein Science, 12(1604-1612) .

[7] Ronald A. Fisher. (1936) The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7:179–188 .

[8] Amir Globerson, Gal Chechik, Fernando Pereira, and Naftali Tishby.(2005) Euclidean embedding of co-occurrence data. In Law-

46

Y. Guo ET AL.

Page 49: TABLE OF CONTENTSas well as the classical permutation and combination principle, extract putative regulatory signatures using both statistical and graph-based approaches and finally,

rence K. Saul, YairWeiss, and L´eon Bottou, editors, Advances in Neural Information Processing Systems 17, pages 497–504. MIT Press, Cambridge, MA .

[9] Yi Guo, Junbin Gao, and Paul W. Kwan. (2006) Kernel Laplacian eigenmaps for visualization of non-vectorial data. In Lecture Notes on Artificial Intelligence, volume 4304, pages 1179– 1183.

[10] Yi Guo, Junbin Gao, and Paul W. Kwan. (2008) Twin kernel em-bedding. IEEE Transaction ofPattern Analysis and Machine Intel-ligence, submitted.

[11] Jihun Ham, Yuanqing Lin, and Daniel. D. Lee. (2005) Learning nonlinear appearance manifolds for robot localization. In IEEE/RSJ International Conference on Intelligent Robots and Sys-tems, pages 2971– 2976.

[12] Jens Hanke and Jens G. Reich. (1996) Kohonen map as a visuali-zation tool for the analysis of protein sequences: multiple align-ments, domains and segments of secondary structures. Bioinfor-matics, 12(6):447–454.

[13] M. Jolliffe. (1986) Principal Component Analysis. Springer-Verlag, New York.

[14] Philip M. Kim and Bruce Tidor. (2003) Subsystem identification through dimensionality reduction of large-scale gene expression data. Genome Research, 13:1706–1718.

[15] Nathan Mekuz, Christian Bauckhage, and John K. Tsotsos. (2005) Face recognition with weighted locally linear embedding. In Pro-ceedings of the 2nd Canadian Conference on Computer and Robot Vision, pages 290–296.

[16] Oleg Okun, Helen Priisalu, and Alexessander Alves. (2005) Fast non-negative dimensionality reduction for protein fold recognition. In ECML, pages 665–672.

[17] Zhengjun Pan, Rod Adams, and Hamid Bolouri. (2000) Dimen-sionality reduction of face images using discrete cosine transforms for recognition. In IEEE Conference on Computer Vision and Pat-

tern Recognition. [18] Jian Qiu, Martial Hue, Asa Ben-Hur, Jean-Philippe Vert, and William Stafford Noble. (2007) An alignment kernel for protein structures. In Bioinformatics.

[18] Bisser Raytchev, Ikushi Yoda, and Katsuhiko Sakaue. (2006) Multi-view face recognition by nonlinear dimensionality reduction and generalized linear models. In FGR ’06: Proceedings of the 7th International Conference on Automatic Face and Gesture Recog-nition (FGR06), pages 625–630, Washington, DC, USA.

[19] Sam T. Roweis and Lawrence K. Saul. (2000) Nonlinear dimen-sionality reduction by locally linear embedding. Science, 290(22):2323–2326.

[20] B. Sch¨olkopf and A.J. Smola. (2002) Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Be-yond. The MIT Press, Cambridge, MA.

[21] Bernhard Sch¨olkopf, Alexander J. Smola, and Klaus-Robert M¨uller. (1998) Nonlinear component analysis as a kernel eigen-value problem. Neural Computation, 10:1299–1319.

[22] J.A.K. Suykens and J. Vandewalle. (1999) Least squares support vector machine classifiers. Neural Processing Letters, 9:293–300.

[23] Johan A.K. Suykens. (2007) Data visualization and dimensionality reduction using kernel maps with a reference point. Technical Re-port 07-22, K.U. Leuven, ESAT-SCD/SISTA.

[24] Joshua B. Tenenbaum, Vin de Silva, and John C. Langford. (2000) A global geometric framework for nonlinear dimensionality reduc-tion. Science, 290(22):2319–2323.

[25] Miguel L. Teodoro, George N. Phillips Jr, and Lydia E. Kavraki. (2002) A dimensionality reduction approach to modeling protein flexibility. In International Conference on Computational Molecu-lar Biology (RECOMB), pages 299–308.

[26] Miguel L. Teodoro, George N. Phillips Jr, and Lydia E. Kavraki. (2003) Understanding protein flexibility through dimensionality reduction. Journal of Computational Biology, 10(3-4):617–634

47

Y. Guo ET AL.