Hsinchu Taiwan March 9-13, 2008 Program and …ibm4.life.nthu.edu.tw/jtws/brochure.pdf7 日本－台灣 Japan‐Taiwan Conference on Computational and Systems Biology March 10th, Monday

The 1st

JAPAN-TAIWAN YOUNG RESEARCHERS

CONFERENCE ON

COMPUTATIONAL AND SYSTEMS BIOLOGY

Symposium Venue

College of Life Sciences, National Tsing Hua University, Taiwan

Organized by

Institute of Bioinformatics and Structural Biology,

National Tsing Hua University, Taiwan

Department of Computational Biology,

University of Tokyo, Japan

Sponsored by

The 21st Century COE program “Elucidation of language

structure and semantics behind genome and life system”, JSPS, Japan

National Health Research Institutes, Taiwan

National Research Program for Genomic Medicine, Taiwan

National Center for High-Performance Computing, Taiwan

The Taiwanese Society for Bioinformatics and Systems Biology

Ministry of Education

Hsinchu Taiwan March 9-13, 2008

Program and Abstracts

2 Japan‐Taiwan Conference on Computational and Systems Biology■■日本－台灣

Sponsors

National Tsing Hua University

http://www.nthu.edu.tw/

Japan Society for the Promotion of Science (JSPS)

21st Century COE Program “Elucidation of language

structure and semantics behind genome and life system”

http://www.jsps.go.jp/english/e-21coe/index.html

National Health Research Institutes

http://www.nhri.org.tw/

National Research Program for Genomic Medicine (NRPGM)

http://genmed.sinica.edu.tw/

National Center for High-Performance Computing

http://www.nchc.org.tw/

The Taiwanese Society for Bioinformatics and Systems Biology

http://bst.nchc.org.tw/

Ministry of Education,

教育部邁向頂尖大學計畫、教育部顧問室生物及醫學科技人才培訓先導型計畫

http://www.edu.tw/

3Japan‐Taiwan Conference on Computational and Systems Biology■■日本－台灣

Conference Organizations

Advisory Committee

• Ping‐Chiang Lyu Institute of Bioinformatics and Structural Biology, National Tsing Hua University, Taiwan

• Masanori Arita Department of Computational Biology University of Tokyo

Organizing Committee

• Allan Lo Taiwan International Graduate Program, Institute of Information Science, Academia Sinica

• Aya Misawa Graduate School of Frontier Sciences, University of Tokyo

• Shao‐Wei Huang Institute of Bioinformatics, National Chiao Tung University

• Shu‐Hsi Lin University of Tokyo and National Tsing Hua University

• Wataru Iwasaki Graduate School of Frontier Sciences, University of Tokyo

• Wei‐Cheng Lo Institute of Bioinformatics and Structural Biology, National Tsing Hua University


General Information

Scientific Program

The scientific program will last for two and half days. The program consists of 3 plenary lectures (40 minutes) and 10 invited lectures (25 minutes); also 10 oral presentations (15 minutes each), are selected from poster abstracts; and a poster session.

Speakers

All speakers need to offer an appropriate time (at least 3‐5 min) for discussion. Presenters should check your slide at PC registration before your presentation starts.

Posters

Posters should be mounted before 16:00 on 10th March. Poster presenters are requested to explain your posters in the following duty time: 16:30, March 10th.

Website

Information about the conference is available on these web‐pages: http://jtws.life.nthu.edu.tw/ http://zp.cb.k.u-tokyo.ac.jp/research/jt-ws/


Conference Schedule

3/9 Sunday

3/10 Monday

3/11 Tuesday

08:20 –08:30

Opening remark

08:30 –10:00

Plenary lecture + 2 IL

08:30 – 10:00

Plenary lecture+ 2 IL

10:00 ‐10:25

Coffee break 10:00 ‐ 10:25

Coffee break

10:25 –12:00

2 IL + 3 OP 10:25 – 12:00

2 IL + 3 OP

12:00 –13:30

Lunch 12:00 – 13:30

Lunch & Closing ceremony

13:30 –15:00

Plenary lecture + 2 IL

15:00– 15:20

Coffee break

11:00 |

14:00

Registration

15:20 –16:20

4 OP

16:30 –18:30

Poster session 19:00 – Reception

19:00 –21:00

Banquet

13:30 |

Visiting NHRI

Plenary lecture (PL): 40 min Invited lecture (IL): 25 min, Oral presentation (OP): 15 min


3/12 Wednesday

3/13 Thursday

09:00 – 10:30 Visiting Institute of Information Science, Academia Sinica

10:30 – 12:00 Visiting Genomic Research Center, Academia Sinica

10:00 – 12:00Dr. Cox’ Lecture

at GRC, Academia Sinica

12:00 ‐ 14:30

Lunch and a meeting for the 2nd

Japan‐Taiwan Young Researchers’ Conference

12:00 ‐ 13:30 Lunch

14:30 |

Introduction to TIGP and local tour of Academia Sinica

13:30 |

Transportation to the airport


March 10th, Monday

08:20 – 08:30 Opening Remark: Masanori Arita, University of Tokyo, Japan

Session A Chair: Shu‐Hsi Lin

08:30 – 09:10 PL: Dr. Wen‐Hsiung Li, University of Chicago, USA How to be a Computational Biologist? A Personal Experience

09:10 – 09:35 IL: Dr. Hidemi Watanabe, Hokkaido University, Japan An Application of Evolutionary Methods to Experimental Biology

09:35 – 10:00 IL: Dr. Chao A. Hsiung, National Health Research Institutes, Taiwan Viral Bioinformatics ‐ Sequence Analysis and Time Course Gene Expression Analysis

10:00 – 10:25

Coffee Break

Session B Chair: Shao‐Wei Huang

10:25 – 10:50 IL: Dr. Yutaka Suzuki, University of Tokyo, Japan Transcriptome Analyses of Human Genes

10:50 – 11:15 IL: Dr. Hsuan‐Cheng Huang, Natinoal Yang Ming University, Taiwan MicroRNA‐Regulated Protein Interaction Network

11:15 – 11:30 OP: Kinya Okada Effect of Two Rounds of Whole‐Genome Duplication on the Adenohypophysis‐Mediated Endocrine System in Early Vertebrates


11:30 – 11:45 OP: Wataru Iwasaki

Reconstruction of Highly Heterogeneous Gene‐Content Evolution across the Three Domains of Life

11:45 – 12:00 OP: Emily Chia‐Yu Su Protein Subcellular Localization Prediction Based on Gapped‐dipeptide Signatures and Document Classification

12:00 – 13:30

Lunch

Session C Chair: Allan Lo

13:30 – 14:10 PL: Dr. Russell Cox, University of Bristol, UK Annotation, Molecular Genetics, Biochemistry and Chemistry of Fungal Polyketide Synthases

14:10 – 14:35 IL: Dr. Takashi Takahashi, Nagoya University, Japan Expression profiling‐based approaches for a better understanding of human lung cancers

14:35 – 15:00 IL: Dr. Tsung‐Lin Li, Genomics Center, Academia Sinica, Taiwan Glycopeptide Antibiotic Biosynthesis: The Synthesis of N‐Acyl Aminoglucuronic Acid in Antibiotic A40926

15:00 – 15:20

Coffee Break


Session D

15:20 – 15:35 OP: Eric Perrier Perm: an Efficient Topological Orderings‐Based Algorithm for Bayesian Network Structure Learning

15:35 – 15:50 OP: Kaname Kojima Fast SCB‐Grid Layout Algorithm using Sweep Calculation

15:50 – 16:05 OP: Wei‐Cheng Lo CPSARST – Circular Permutation Search Aided by Ramachandran Sequential Transformation

16:05 – 16:20 OP: Shih‐Yen Ku A Multi‐Strategy Approach to Protein Structural Alphabet Design

16:30 – 18:30

Poster Session

19:00 – 21:00 Banquet

March 11th, Tuesday

Session E Chair: Wataru Iwasaki

08:30 – 09:10 PL: Dr. Limsoon Wong, National University of Singapore, Singapore Guilt by Association

09:10 – 09:35 IL: Dr. Jenn‐Kang Hwang, National Chiao Tung University, Taiwan On the Structural Characteristics of Protein Active Sites

09:35 – 10:00 IL: Dr. Shinya Kuroda, University of Tokyo, Japan Systems Biology of ERK Signaling Networks

10:00 – 10:25

Coffee Break


Session F Chair: Wei‐Cheng Lo

10:25 – 10:50 IL: Dr. Kiyoshi Asai, University of Tokyo and Computational Biology Research Center, Japan Bioinformatics of Non‐coding RNA

10:50 – 11:15 IL: Dr. Ping‐Chiang Lyu, National Tsing Hua University, Taiwan Construction of Whole Genomic and Proteomic Tree Based on DNA and Protein Probes

11:15 – 11:30 OP: Yuka Watanabe Computational and Experimental Analysis of Negative Feedback Regulation within MicroRNA Processing Pathway

11:30 – 11:45 OP: Shao‐Wei Huang On the Relationship between Protein Structure and Dynamics

11:45 – 12:00 OP: Allan Lo The Ins and Outs of Membrane Protein Topology Prediction

12:00 – 13:30

Lunch & Closing Ceremony


Transportations

Taiwan High Speed Rail (THSR)

The newly completed modern Taiwan High Speed Rail (THSR) provides a fast, safe, and comfortable method of traveling along the west coast of Taiwan. It is a bullet train based on Japanese Shinkansen technology that covers the 345km route on the West Coast from Taipei to Kaohsiung (Zuoying) in 90‐120 minutes. Participants whose arrival destination is Taiwan Taoyuan International Airport (TIA) are highly recommended to take the THSR to Hsinchu Station in order to reach the venue. The cost of one‐way ticket is NT$130 (approximately US$ 4).

For information on Taiwan High Speed Rail, please visit the THSR website

• Route of THSR


From Taoyuan International Airport (TIA) to National Tsing Hua University (NTHU)

TIA – THSR Taoyuan Station By taxi or bus 10 mins

THSR Taoyuan Station – THSR Hsinchu Station By THSR 12 mins

THSR Hsinchu Station – NTHU By taxi 30 mins

Location of the National Tsing Hua University, HsinChu

Transportation to Hsinchu

How to reach the National Tsing Hua University

By Train From Taipei Take a train from Taipei Train Station to Hsin Chu. Then take Hsin Chu City Bus #1 or #1 甲 (which means #1A), and get off at the “National Tsing Hua University” stop. Instead, you may take a taxi .


From Taichung to NTHU

Take a train from Taichung Train Station to Hsin Chu. Then take Hsin Chu City Bus #1 or #1 甲 (which means #1A), and get off at the “National Tsing Hua University” stop. Instead, you may take a taxi . Time table for train.

• By Bus From Taipei Take buses run by the Guo Guan Transportation Services (國光客運

公司) at the Taipei Northern Station to Hsing Chu and dismount at the “Tsing Hua University” station. From Tao Yuen Chiang Kai‐Shek Memorial Airport Take bus run by the Guo Guan Transportation Services (國光客運公

司) from the Airport to Hsing Chu and dismount at the “Tsing Hua University” station. From Taichung Take buses run by the Guo Guan Transportation Services (國光客運

公司) from the Taichung Kan Cheng Station to Hsing Chu and dismount at the “Tsing Hua University” station.

• By car From Tao‐Yuen CKS Airport: Take Freeway 2 from the airport to Tao‐Yuen (east), and connect to the Chung‐Shan Freeway 1 at CKS airport interchange toward ʹSouthʹ. Get off the Freeway 1 at the Hsin‐Chu interchange, and go towards Hsin‐Chu City. The entrance of the NTHU is at the third traffic light on the left hand side. From other cties. Take Freeway 1 or Freeway 3 toward Hsin‐Chu. If you take Freeway 3, connect to Freeway 1 at Chu‐Nan system interchange towards North. Then get off Freeway 1 at Hsin‐Chu interchange towards Hsin‐Chu City. The entrance of the NTHU is at the third traffic light on the left hand side.


Taxi Cab

It includes Initial, Every extra and Waiting Fare

1250 m 250 m 3 mins

90 NTD 5 NTD 5 NTD

No extra charge for radio booking and using the trunk Fare to be bargained during Lunar New Year but not allowed to exceed 50% of the meter price

Remark: Timing Fare is charged, when the speed is lower than 5kmph, at 5 NTD for every three minutes


Session A Plenary Lecture: Dr. Wen‐Hsiung Li Invited Lecture: Dr. Hidemi Watanabe Invited Lecture: Dr. Chao A. Hsiung


Plenary Lecture: Dr. Wen‐Hsiung Li

HOW TO BE A COMPUTATIONAL BIOLOGIST: A PERSONAL EXPERIENCE

Wen-Hsiung Li 1

1 University of Chicago, Chicago, USA

I have been pursuing computational biology since my Ph.D. student years and have published more than 270 original papers. A reflection of my personal experience might be of use to young scientists who are entering into or are relatively new to computational biology. I came into computational biology from the mathematical side and I shall explain how I struggled in my early years. I shall reflect what I seem to have done right and what I would now do it differently. I shall also explain how I learned biology and eventually became a bona fide biologist. Finally, I shall explain why it is important to keep up with new developments and how to do that.


Invited Lecture: Dr. Hidemi Watanabe

AN APPLICATION OF EVOLUTIONARY METHODS TO EXPERIMENTAL BIOLOGY

Hidemi Watanabe 1

1 Graduate School of Information Science and Technology, Hokkaido University, Japan

In my lecture, I will present an evolutionary study of bioinformatics tools for experimental biologists that has been done by one of my students. The polymerase chain reaction or PCR is a technique widely used in molecular biology for amplifying a small amount of DNA across several orders of magnitude, generating millions or more copies of the original DNA piece. The power and selectivity of PCR are primarily due to selecting DNA primers/oligonucleotides that are highly complementary to the DNA region targeted for amplification. Therefore, we have to design appropriate primers prior to the amplification of the target DNA region. If genome sequence data of the target is available, we can design primers simply by applying the genomic data to some software, e.g., primer3. However, practically no tool exists that allow design of primers for unsequenced targets. In such a case, we make a multiple alignment of sequences obtained from very close species to the one of the target sequence and try to design primers using highly conserved regions in the alignment, but this approach does not always work. One of the main reasons of this is high diversification between the available sequences that causes high degeneracy of the primers. The use of highly degenerate primers, e.g., degeneracy = >512, may cause problems such as unspecific amplification by miss-annealing of primers and no amplification due to the low concentration of the perfectly matched primer in the degenerate primer set. Our tool is to select the most likely primer regions and primer sets simultaneously among all possible degenerate primers based on a codon/nucleotide substitution model and the molecular phylogeny between the aligned sequences. Some results of PCR using the primer sets that were designed for sequence-unknown targets with our tool.


Invited Lecture: Dr. Chao A. Hsiung

VIRAL BIOINFORMATICS : SEQUENCE ANALYSIS AND TIME COURSE GENE EXPRESSION ANALYSIS

Chao A. Hsiung1

1 Division of Biostatistics and Bioinformatics National Health Research Institutes

I will talk on two problems arising in viral bioinformatics : one is on predicting antigenic variants of influenza A/H3N2 virus using sequence data ; the other is about tool development for microarray analyses on virus gene expression time course data

(I)Sequence analysis

Continual and accumulated mutations in hemagglutinin (HA) protein of influenza A virus generate novel antigenic strains that cause annual epidemics. We propose a model by incorporating scoring and regression methods to predict antigenic variants. Based on collected sequences of influenza A/H3N2 viruses isolated between 1971 and 2002, our model can be used to accurately predict the antigenic variants in 1999–2004 (agreement rate = 91.67%). Twenty amino acid positions identified in our model contribute significantly to antigenic difference and are potential immunodominant positions. (joint with Liao Y.C., Lee M.S. et al.)

(II)Time course gene expression analysis

There have been several studies of genome-wide temporal transcriptional program of viruses, based on microarray experiments, which are generally useful in the construction of gene regulation network. It seems that all biological interpretations in these studies are directly based on the normalized data and some crude statistics, which provide rough estimates of certain limited features of the profile and may incur biases. We illustrate a hierarchical Bayesian shape restricted regression method in making inference on the time course expression of virus genes. The prior, introduced by Bernstein polynomials, takes into consideration the geometric constraints on the profile and has its parameters determined by data; the hierarchical modeling takes advantage of the correlation between the genes so as to enjoy the shrinkage effects. This method offers the possibility of comparing genome-wide expression studies with different designs. One specific advantage of this method is that estimates of many salient features of the expression profile like onset time, inflection point, maximum value, time to maximum value, etc. can be obtained immediately. (joint with Chang I.S., Chien L.C. et al.)


Session B Invited Lecture: Dr. Yutaka Suzuki Invited Lecture: Dr. Hsuan‐Cheng Huang Oral Presentation: Kinya Okada Oral Presentation: Wataru Iwasaki Oral Presentation: Emily Chia‐Yu Su


Invited Lecture: Dr. Yutaka Suzuki

TRANSCRIPTOME ANALYSES OF HUMAN GENES

Yutaka Suzuki1

1 Department of Medical Genome Sciences, Graduate School of Frontier Sciences, The University of Tokyo

Although recent studies have revealed that the majority of human genes are subjected to regulation of alternative promoters, the biological relevance of this phenomenon remains unclear. Based on the 5’-end sequences of the FLJ human full-length cDNAs, we have developed a database of transcriptional start sites and putative promoters, DBTSS (http://dbtss.hgc.jp). Starting from the registered approximately 1.8 million human cDNAs, collectively covering 14,312 RefSeq genes, we have also demonstrated that 52 % of the human RefSeq genes examined contain putative alternative promoters (PAPs). In the present study, we report large-scale comparative studies of PAPs between human and mouse counterpart genes. Detailed sequence comparison of the 17,245 putative promoter regions (PPRs) in 5,463 PAP-containing human genes revealed that PPRs in only a minor fraction of genes (807 genes) showed clear evolutionary conservation as one or more pairs. Also, we found that there were substantial qualitative differences between conserved and non-conserved PPRs, with the latter class being AT-rich PPRs of relative minor usage, enriched in repetitive elements and sometimes producing transcripts that encode small or no proteins. Systematic luciferase assays of these PPRs revealed that both classes of PPRs did have promoter activity, but that their strength ranges were significantly different. Furthermore, we demonstrate that these characteristic features of the non-conserved PPRs are shared with the PPRs of previously discovered putative non-protein coding transcripts. Taken together, our data suggest that there are two distinct classes of promoters in humans, with the latter class of promoters emerging frequently during evolution. Enriched cDNA data, which we are producing using the 2nd generation sequencer, Solexa (20 million TSS data in MCF7 and HEK293 cells is made available from DBTSS), would enable further detailed analyses on dynamic regulations and evolutional turnover of human transcriptome.


Invited Lecture: Dr. Hsuan‐Cheng Huang

MICRORNA-REGULATED PROTEIN INTERACTION NETWORK

Hsuan-Cheng Huang1

1 Natinoal Yang Ming University, Taiwan

Protein-protein interactions are critical to most biological processes. Available high-throughput experiments on protein-protein interactions allow us to build the interaction network giving more insight. MicroRNAs regulate the protein encoding genes at the post-transcriptional level. However, the relationship between protein-protein interaction network and microRNA regulation is still not clear. We have performed topological analysis to elucidate the global correlation between microRNA regulation and protein-protein interaction network in human. The analysis showed that target genes of individual microRNAs tend to be hubs and bottlenecks in the network. While proteins directly regulated by a microRNA might not form a network module themselves, the microRNA-target genes and their interacting neighbors jointly showed significantly higher density and modularity. Our findings shed light on how microRNA may regulate the protein interaction network.


Oral Presentation: Kinya Okada

EFFECT OF TWO ROUNDS OF WHOLE-GENOME DUPLICATION ON THE ADENOHYPOPHYSIS-MEDIATED ENDOCRINE SYSTEM IN

EARLY VERTEBRATES

Kinya Okada1 Kiyoshi Asai 1

[email protected] [email protected]

1 Department of Computational Biology, Graduate School of Frontier Sciences,

The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, Japan

The adenohypophysis of vertebrates receives peptide hormones from the hypothalamus and secretes hormones that regulate diverse physiologic processes in peripheral organs. Phylogenetic analysis indicates that the emergence of the adenohypophysis-mediated endocrine system coincided with two rounds of whole-genome duplication (2R-WGD) in early vertebrates, but direct evidence linking these events has been unavailable. We detect all human paralogons (series of paralogous regions) formed in early vertebrates as traces of 2R-WGD, and examine the relationship between 2R-WGD and the evolution of genes essential to the adenohypophysis-mediated endocrine system. Regarding genes encoding transcription factors (TFs) involved in the terminal differentiation into hormone-secreting cells in adenohypophyseal development, we show that most pairs of these genes and their paralogs were part of paralogons. In addition, our analysis also indicates that most of the paralog pairs in families of adenohypophyseal hormones and their receptors were part of paralogons. These results suggest that 2R-WGD played an important role in generating genes encoding adenohypophyseal TFs, hormones, and their receptors for increasing the diversification of hormone repertoire in the adenohypophysis-mediated endocrine system of vertebrates.

Keywords: whole-genome duplication, adenohypophysis, endocrine system, vertebrate, paralogon





Oral Presentation: Wataru Iwasaki

RECONSTRUCTION OF HIGHLY HETEROGENEOUS GENE-CONTENT EVOLUTION ACROSS THE THREE DOMAINS OF LIFE

Wataru Iwasaki Toshihisa Takagi


1 Department of Computational Biology, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, Japan

Reconstruction of gene-content evolutionary history is fundamental in studying the evolution of genomes and biological systems. To reconstruct plausible evolutionary history, rates of gene gain/loss should be estimated by considering the high level of heterogeneity: for example, genome duplication and parasitization, respectively, result in high rates of gene gain and loss. Recently, we have developed a novel effective and efficient method for reconstructing heterogeneous gene-content evolution. This method comprises analytically integrable modeling of gene-content evolution, analytical formulation of expectation-maximization, and efficient calculation of marginal likelihood using an inside-outside-like algorithm. Simulation tests on the scale of hundreds of genomes showed that both the gene gain/loss rates and evolutionary history were effectively estimated within a few days of computational time. Subsequently, this algorithm was applied to an actual data set of nearly 200 genomes to reconstruct the heterogeneous gene-content evolution across the three domains of life. The reconstructed history, which contained several features consistent with biological observations, showed that the trends of gene-content evolution were not only drastically different between prokaryotes and eukaryotes, but were highly variable within each form of life. The results suggest that heterogeneity should be considered in studies of the evolution of gene content, genomes, and biological systems.

Keywords: expectation-maximization; gene content; genome evolution; heterogeneous evolution; reconstruction





Oral Presentation: Emily Chia‐Yu Su

PROTEIN SUBCELLULAR LOCALIZATION PREDICTION BASED ON GAPPED-DIPEPTIDE SIGNATURES AND DOCUMENT CLASSIFICATION

Jia-Ming Chang 1 Emily Chia-Yu Su 2,3

Allan Lo 2,4 Hua-Sheng Chiu 1

Ting-Yi Sung 1 Wen-Lian Hsu1 1

1 Bioinformatics Lab., Institute of Information Science, Academia Sinica, Taipei, Taiwan 2 Bioinformatics Program, Taiwan International Graduate Program, Academia Sinica, Taipei, Taiwan

3 Institute of Bioinformatics, National Chiao Tung University, Hsinchu, Taiwan 4 Department of Life Sciences, National Tsing Hua University, Hsinchu, Taiwan

The study of protein subcellular localization (PSL) is important for elucidating

protein functions involved in various cellular processes. However, determination

of subcellular localization using experimental approaches is time-consuming.

Thus, using computational approaches to predict localization efficiently has

become highly desirable. We present a prediction method, PSLDoc (Protein

Subcellular Localization prediction based on Document classification), which

incorporates a probabilistic latent semantic analysis (PLSA) with a support

vector machine (SVM) model based on document classification techniques for

both prokaryotes and eukaryotes. Our method extracts biological features from

gapped-dipeptides of various distances, where evolutionary information from the

position specific scoring matrix is utilized to determine the weighting of each

gapped-dipeptide. The features are further reduced by PLSA and incorporated as

input vectors for SVM classifiers. PSLDoc achieves 93.0%, 90.6%, 78.7%, and

83.1% in overall accuracy for Gram-negative bacteria, Gram-positive bacteria,

human, and plant proteins. Experiment results show that feature selection and

reduction by document classification techniques can lead to a significant

improvement in the prediction performance. Moreover, we demonstrate that

PLSA automatically selects discriminating sequence motifs and greatly reduces

the feature dimension without sacrificing the prediction accuracy. The web

server of PSLDoc is publicly available at

http://bio-cluster.iis.sinica.edu.tw/~bioapp/PSLDoc.


Session C Plenary Lecture: Dr. Russell Cox Invited Lecture: Dr. Takashi Takahashi Invited Lecture: Tsung‐Lin Li


Plenary Lecture: Dr. Russell Cox ANNOTATION, MOLECULAR GENETICS, BIOCHEMISTRY AND CHEMISTRY

OF FUNGAL POLYKETIDE SYNTHASES Russell Cox 1

[email protected]

1 School of Chemistry, University of Bristol, UK

Filamentous fungi produce an extremely diverse array of secondary metabolites

- many of these derived from polyketide and non-ribosomal peptide pathways.

Fungal Genome Projects are revealing that many fungi have multiple polyketide

synthase (PKS) and non-ribosomal peptide synthetases (NRPS) genes - often

many more than the known metabolites of the organisms. Significant challenges

exist, therefore, in linking gene sequence to the chemical structure of the

secondary metabolites. Our work in Bristol focuses on understanding the

enzymology of fungal polyketide synthases and their associated tailoring

enzymes. Our approach focusses on the rapid cloning of fungal PKS and fungal

PKS-NRPS genes. The heterologous expression of these genes leads to the

production of new compounds - and the structure of the compounds leads to

important conclusions about the programming of the PKS and NRPS proteins.

The lecture will focus on the compounds tenellin, fusarin-C and xenovulene

from Beauveria bassiana, Fusarium moniliforme and Acremonium strictum

respectively.g

NH

O

OOH

HO

O

O OMe

N OOH

OHHO

O

O

O

OH

O

*

*

*

*Fusarin-C

TenellinXenovulene-A


Invited Lecture: Dr. Takashi Takahashi EXPRESSION PROFILING-BASED APPROACHES FOR A BETTER

UNDERSTANDING OF HUMAN LUNG CANCERS Takashi Takahashi 1

[email protected]

1 Division of Molecular Carcinogenesis, Center for Neurological Diseases and Cancer, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya 466-8550, Japan.

The long-term survival rate for lung cancer patients remains unsatisfactory even among those who

successful undergo a potentially curative resection, yielding only less than 50% 5-year postoperative

survival. The current classification of lung cancer is largely based on histologic features, which

divides this heterogeneous neoplasm into four major types including small cell carcinoma,

adenocarcinoma, squamous cell carcinoma and large cell carcinoma. It is well recognized that

marked variations are present even within a particular histologic type and that adenocarcinomas

exhibit the highest degree of morphologic and clinical diversities. Thus, more detailed, accurate, and

objective methods for classification, especially for adenocarcinomas, are greatly anticipated, not

only for a better understanding of the pathogenesis, but also for improving the currently inadequate

diagnostic capabilities to develop more effective treatment strategies. Presently, there is no accurate

means for predicting individual patient prognosis, though it is indispensable to maximize the effects

of therapy without over-treating those without residual disease. Microarray technology has made it

possible to analyze gene expression profiles on a genome-wide basis in order to better understand

molecular pathogenesis of human cancers as well as to search for molecular markers for

classification and prediction of outcome. The expression profile in a given tumor can be regarded as

the outcome of complex influences resulting from accumulated genetic and epigenetic changes

important for pathogenesis, as well as from the differentiation-commitment status of the progenitor

cells. Our recent comprehensive study of lung cancer transcriptome signatures allowed us to

establish an expression profile-defined, highly robust classification of adenocarcinomas, divided into

terminal respiratory unit (TRU) and non-TRU types, which also made it possible to reconcile the

results of previous expression profiling studies. Expression profiling analysis aimed at individualized

patient outcome prediction is another important and much needed application, since survival or death

is a matter of all or nothing, and currently available information regarding what percentage of those

at a certain disease stage are likely to survive after a certain period of time is insufficient in many

respects. Evidence for the existence of an expression profile-defined group of patients with very

poor prognosis based on both transcriptome and proteome signatures will also be reported from our

recent findings. Detailed expression profiling analyses of transcriptomes and proteomes thus appear

to have potential to provide increased understanding of the molecular biology of NSCLC, and their

clinical applications in the very near future are anticipated. In addition, such elucidation may

ultimately lead to improved outcomes for patients afflicted by this devastating disease.


Invited Lecture: Tsung‐Lin Li GLYCOPEPTIDE ANTIBIOTIC BIOSYNTHESIS: THE SYNTHESIS OF N-ACYL

AMINOGLUCURONIC ACID IN ANTIBIOTIC A40926

Tsung-Lin Li 1

1 Genomics Center, Academia Sinica, Taiwan

The unique pharmacokinetic and pharmacodynamic activities of glycopeptide antibiotics are conferred by tailoring steps occurring on the aglycone. An N-acyl aminoglucuronic acid moiety in glycopeptides A40926 has been shown critical for its biological effectiveness and thus an ideal position for further chemoenzymatic modifications. The biosynthesis of the moiety mentioned was elusive. Now, gene products Dbv 9, 21, 8, and 29 in glycopeptides A40926 gene cluster have been characterized to be glycosyltransferase, deacetylase, acyltransferase and hexose oxidase, respectively. They act in sequence to complete the synthesis of the N-acyl aminoglucuronic acid substituent for the potent drug lead. The characterized enzymes may provide new ways to further enhance the efficacy of currently used glycopeptide drugs. In addition, detailed function-mechanism analyses of the enzymes revealed increase our knowledge of new classes of enzymes.


Session D Oral Presentation: Eric Perrier Oral Presentation: Kaname Kojima Oral Presentation: Wei‐Cheng Lo Oral Presentation: Shih‐Yen Ku


Oral Presentation: Eric Perrier

PERM: AN EFFICIENT TOPOLOGICAL ORDERINGS-BASED ALGORITHM FOR BAYESIAN NETWORK STRUCTURE LEARNING

Eric Perrier Seiya Imoto


Satoru Miyano [email protected]

1 Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1

Shirokanedai, Minato-ku, Tokyo 108-8639, Japan

Inferring the Directed Acyclic Graph (DAG) of a Bayesian Network from data is one of the most promising ongoing researches in order to understand the causal relations underlying a set of random variables, such as genes. In fact, the recent profusion of microarray data enables theoretically to infer accurate gene networks, even for thousands of genes. Unfortunately, the two approaches currently in use, scoring function and independency test (IT), have disadvantages in terms of complexity and lower accuracy of their results. Therefore, following a recent empirical study that showed superiority of a hybrid method, we propose a new search direction through a scoring-based heuristic algorithm, PERM, that improves both speed and accuracy of the resulting graph. By taking advantage of a new structural constraint, the super-structure [3], this algorithm can precisely approximate the best score of the graphs that follow a given topological ordering in linear time. Thereby, it is able to apply efficiently a greedy hill climbing search over the space of orderings. In order to improve further the resulting graph, we extend this algorithm with a classical post-processing hill climbing search over DAGs. This final method, PERM+, is experimentally proved to outperform other algorithms over a wide range of networks while processing faster. Thus, it is a promising tool for understanding large genes networks.

Keywords: Bayesian networks; Structure learning; Super-structure;

Topological ordering






Oral Presentation: Kaname Kojima

FAST SCB-GRID LAYOUT ALGORITHM USING SWEEP CALCULATION

Kaname Kojima Masao Nagasaki [email protected] [email protected]

Satoru Miyano

[email protected]

1 Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan

Properly drawn biological networks are of great help in the comprehension of their characteristics. Since it is unrealistic to manually draw biological networks for every retrieval, automatic drawing algorithms are essential. Grid layout algorithms handle various biological properties such as complicated positional constraints according to their subcellular localizations; thus, they succeed in providing biologically comprehensible layouts. Since existing grid layout algorithms are not suitable for real-time drawing due to their high computational cost, we present a new grid layout algorithm that has less time complexity using sweep calculation. We conduct practical experiments by using 95 pathway models of various sizes from TRANSPATH and prove its efficiency.

Keywords: grid embedding, graph drawing, visualization





Oral Presentation: Wei‐Cheng Lo

CPSARST – CIRCULAR PERMUTATION SEARCH AIDED BY RAMACHANDRAN SEQUENTIAL TRANSFORMATION

Wei-Cheng Lo2 Ping-Chiang Lyu1,2


1 Department of Life Sciences National Tsing Hua University, Hsinchu 30013, Taiwan 2 Institute of Bioinformatics and Structural Biology,National Tsing Hua University, Hsinchu, Taiwan

Circular permutation (CP) of a protein can be visualized as if the original amino- and carboxyl-termini were linked and new ones created elsewhere. Circular permutants usually retain native structures and biological functions. This interesting property has made CP useful in many protein engineering fields. Although there have been many circular permutants found in well-known protein families, efficient search tools are still very rare because of the complicated rearrangement nature of CP. Here we report CPSARST (Circular Permutation Search Aided by Ramachandran Sequential Transformation), to be a novel and efficient circular permutant search method. It features with (1) describing three-dimensional structures as one-dimensional text strings, (2) duplicating the query structure and (3) working through a “double filter-and-refine” strategy. When tested with engineered circular permutants, CPSARST successfully retrieved all the natural proteins with accurate permutation site predictions. Its ability to identify natural circular permutations is also comparable to other structure-based CP-detecting methods. The speed of CPSARST is thousands of times as high as related algorithms. Its high efficiency makes routine database searches achievable. In this post-genomics era, when the amount of protein structural data increases exponentially, CPSARST provides a new way to rapidly detect novel relationships among proteins. CPSARST is available at http://sarst.life.nthu.edu.tw/cpsarst.


Oral Presentation: Shih‐Yen Ku

A MULTI-STRATEGY APPROACH TO PROTEIN STRUCTURAL ALPHABET DESIGN

Shih-Yen Ku 1,2 Yuh-Jyh Hu 2,3

1 Institute of Statistics, Academia Sinica, Taipei, Taiwan 2 College of Computer Science, National Chiao Tung University, Hsinchu, Taiwan 3 Institute of Biomedical Engineering, National Chiao Tung University, 1001 University Rd. Hsinchu, Taiwan

The search for structural similarity among proteins can provide valuable insights into their functional mechanisms and their functional relationships. Though the protein 1D sequence contains the information of protein folding, the performance of predicting the 3D-structure directly from the sequence is still limited. As the increase of available protein structures, we can now conduct more precise and thorough studies of protein structures.

Among many is the design of protein structural alphabet that can characterize protein local structures. We use the self-organizing map combined with the minimum spanning tree algorithm for visualization to determine the alphabet size and then apply the kmeans algorithm to group protein fragments into clusters corresponding to the structural alphabet. The intra-cluster and inter-cluster analyses show the significant structural cohesiveness. A comparative study of our alphabet with one of the recently developed structural alphabets also demonstrated a competitive result.


Session E Plenary Lecture: Dr. Limsoon Wong Invited Lecture: Dr. Jenn‐Kang Hwang Invited Lecture: Dr. Shinya Kuroda


Plenary Lecture: Dr. Limsoon Wong

GUILT BY ASSOCIATION

Limsoon Wong1

http://www.comp.nus.edu.sg/~wongls

1 National University of Singapore

A central problem of computational biology is the inference of the function of a protein. The traditional computational approach to this problem is based on the principle of “guilt by association” of sequence similarity. This approach works for about 40-60% of the proteins in a typical proteome. In this talk, we discuss the inference of function for the other proteins that lack informative sequence similarity to proteins with known function. We present guilt by association of common friends --- that two proteins sharing a large number of common interaction partners are likely to share a common function. In contradiction to an old popular belief, we show that proteins that are thus strictly indirectly linked by a large number of common interaction partners have a significantly greater likelihood of function sharing than proteins that are strictly directly interacting. Furthermore, we develop a means to exploit this property to effectively assign functions to proteins in the absence of sequence similarity. In order to fully exploit additional information that is available on some proteins, we also develop an efficient powerful information fusion technique to infer protein functions through guilt by association of multiple information types. On several large gold standard benchmarks, we obtain superior sensitivity, precision, as well as efficiency compared to current state of the art.


Invited Lecture: Dr. Jenn‐Kang Hwang

ON THE STRUCTURAL CHARACTERISTICS OF PROTEIN ACTIVE SITES

Chih-Hao Lu1 Shao-Wei Huang1

Yan-Long Lai1 Chih-Peng Lin1

Chien-Hua Shih1 Cuen-Chao Huang1

Jenn-Kang Hwang*1

1 Institute of Bioinformatics, National Chiao Tung University, HsinChu 30050, Taiwan, Republic of China

Recently, we have developed a method (Shih et al., Proteins: Structure, Function, and Bioinformatics, Proteins. 2007) to compute correlation of fluctuations. This method, referred to as the protein fixed-point (PFP) model, is based on the positional vectors of atoms issuing from the fixed point, which is the point of the least fluctuations in proteins. One corollary from this model is that atoms lying on the same shell centered at the fixed point will have the same thermal fluctuations. In practice, this model provides a convenient way to compute the average dynamical properties of proteins directly from the geometrical shapes of proteins without the need of any mechanical models, and hence no trajectory integration or sophisticated matrix operations are needed. As a result, it is more efficient than molecular dynamics simulation or normal mode analysis. Though in the previous study the PFP model has been successfully applied to a number of proteins of various folds, it is not clear to what extent this model will be applied. In this article, we have carried out the comprehensive analysis of the PFP model for a dataset comprising 972 high-resolution X-ray structures with pair wise

�sequence identity 0.25%. We found that in most cases the PFP model works well. However, in case of proteins comprising multiple domains, each domain should be treated separately as an independent dynamical module with its own fixed point; and in case of the protein complex comprising a number of subunits, if functioning as a biological unit, the whole complex should be considered as one single dynamical module with one fixed point. Under such considerations, the re-sultant correlation coefficient between the computed and the X-ray structural B-factors for the data set is 0.59 and 75% (727/972) of proteins with a correlation coefficient 0.5. Our result shows that the fixed-point model is indeed quite general and will be a useful tool for high throughput analysis of dynamical properties of proteins.

Keywords: protein dynamics; thermal fluctuations; molecular dynamics; normal mode analysis; B-factors


Invited Lecture: Dr. Shinya Kuroda

SYSTEMS BIOLOGY OF ERK SIGNALING NETWORK

Shinya Kuroda1

[email protected]

1 Graduate School of Biophysics and Biochemistry, University of Tokyo

One of the unique properties of signaling networks is to encode various information nto distinct temporal patterns in the same signaling activities. In PC12 cells, constant (step) stimulation of epidermal growth factor (EGF) and nerve growth factor (NGF) encode their information into transient, and transient and sustained extracellular signal-regulated kinase (ERK) activation, leading to cell proliferation and differentiation, respectively. We developed the computational biochemical model of ERK signaling networks, which can reproduce dose- and temporal-dependent activation of signaling molecules. We found that Ras and Rap1 small GTPases encode the rapid and slow emporal patterns in the constant NGF stimulation into transient and sustained ERK activation, respectively1. In addition, the impulse NGF stimulation induced only Ras activation and subsequent transient ERK activation, whereas the ramp NGF stimulation induced only Rap1 activation and subsequent sustained ERK activation (Fig). This means that the same ERK signaling networks can code the distinct information of the growth factors through the distinct temporal patterns. In this workshop, we will discuss such temporal coding of signaling activities as one of the general information processing mechanisms in signaling networks.


Session F Invited Lecture: Dr. Kiyoshi Asai Invited Lecture: Dr. Ping‐Chiang Lyu Oral Presentation: Yuka Watanabe Oral Presentation: Shao‐Wei Huang Oral Presentation: Allan Lo


Invited Lecture: Dr. Kiyoshi Asai BIOINFORMATICS OF NON-CODING RNAS

Kiyoshi Asai1

1 Department of Computational Biology, Graduate School of Frontier Science, University of Tokyo

It has long been believed that proteins are responsible for most of the essential functions in living cells. Recent studies on transcripts, however, have revealed that there are a considerable number of non-coding RNAs, which are RNA transcripts that do not code proteins. Among them, the discovery of miRNAs, which repress the translation of mRNAs into proteins, opened the door of the possibility that non-coding RNAs may play an important role in the regulation of functional biomolecules. In biological sequence analysis, the most important procedure is the comparison of the sequences, which is necessary to find the similar sequences, to categorize the groups of the sequences, to find evolutionally conserved regions and so on. Because the evolution of biological sequences is restricted by their molecular functions, the comparison is not a simple problem of string matching. For example, the functions of proteins are not only dependent on their primary amino-acid sequences but their tertiary structures. For the analysis of non-coding RNAs by bioinformatics, it is also important to consider their tertiary structures. However, there are small numbers of RNAs whose tertiary structures are solved. As the first step to analyze RNA sequences based on their structures, it is common to analyze their secondary structures, which comprise complementary base pairs between distant bases in the primary sequences. There are a number of software tools that predict the secondary structures of RNAs, by energy minimization or by statistical inference, but the predictions are not always accurate. Therefore, it ist not practical to use computational prediction of secondary structure as a preprocess of comparison of RNAs. The computational costs for comparison of RNA sequences, however, are very high when the secondary structures are considered. For simple pairwise comparison based on secondary structural alignment of two RNA sequences of length L, O(L4) memory and O(L6) time are required by the strict algorithm of Sankoff. Those high demands of computation have been prohibited genome-scale analyses of potential functional non-coding RNAs based on their secondary structures. Recent progress of algorithms for RNA sequences, however, partially solved those difficulties. We have developed the methods for sequence comparison, multiple alignment, commons secondary structure prediction and motif extraction of RNA sequences with practical computational costs and accuracies.


Invited Lecture: Dr. Ping‐Chiang Lyu CONSTRUCTION OF WHOLE GENOMIC AND PROTEOMIC TREE BASED ON

DNA AND PROTEIN PROBES

Ping-Chiang Lyu1,2 Chi-Ching Li 2

Wei-Cheng Lo 2 Szu-Ming Lai 1,2

1 Department of Life Sciences National Tsing Hua University, Hsinchu 30013, Taiwan 2 Institute of Bioinformatics and Structural Biology,National Tsing Hua University, Hsinchu 30013,

Taiwan

The classification of microorganisms is difficult because they have various morphological and environmental distributing properties. Since 1970, taxonomy systems have been developed based on some stable and standard molecular biomarkers; for instance, sequence similarity of SSU RNA (small subunit ribosomal RNA) is the first and still wildly used biomarker nowadays for prokaryotes. However, it has been supposed insufficient to classify all kinds of organisms by using one or only a few biomarkers. After the year 2000, the development of genome sequencing techniques has been so rapid that it is now possible to analyze the evolutionary relationships of organisms on the scale of whole genomes.

We have developed a whole genome clustering approach based on the frequency of biologically meaningful probes. We compared bacteria, archaea and fungi to build the genomic and proteomic based tree by an unsupervised clustering method. Our results showed that, the genomic tree grouped together microorganisms with similar GC contents, and the proteomic tree clustered bacteria, archaea and fungi into two branches, where the latter two share the same node. Furthermore, the proteomic tree agreed well with the traditional phylogeny at the basal branches while the distal classifications seemed to reflect phenotypic features, such as the parasitism, thermophilicity, methanogenesis and photosynthesis, better than traditional SSU RNA-based classifications.


Oral Presentation: Yuka Watanabe

COMPUTATIONAL AND EXPERIMENTAL ANALYSIS OF NEGATIVE FEEDBACK REGULATION WITHIN

MICRORNA PROCESSING PATHWAY

Yuka Watanabe 1,2 Nozomu Yachie 1,2


Masaru Tomita 1,2,3 Akio Kanai1,2,3


1 Institute for Advanced Biosciences, Keio University, 997-0017, Japan 2 Systems Biology Program, Graduate School of Media and Governance, Keio University, 252-8520,

Japan 3 Faculty of Environment and Information Studies, Keio University, 252-8520, Japan

MicroRNAs (miRNAs) are short non-coding RNAs, which are processed into approximately 22 nucleotides long sequence from stem-loop precursor transcripts and hybridize incompletely to specific target sites in mRNAs to down regulate target mRNAs. These RNAs have been reported to be processed via various miRNA processing proteins, in order to regulate their target mRNAs. We firstly conducted computational screening of Drosophila miRNA targets against ten mRNAs coding proteins known to be involved in the processing of miRNAs, and revealed eight candidates which are likely to be regulated by specific miRNAs. We performed luciferase reporter gene assay in order to validate predicted miRNA guided regulations of target genes,and identified miRNA-dependent transnational inhibitions of Argonaute1, Argonaute2 and Pasha reporter genes. Moreover, transcription factors of these genes are also likely to be regulated by common miRNAs. From these results, we suggest an existence of negative feedback control within miRNA processing pathway.

Keywords: microRNA, microRNA target prediction, gene regulation, Drosophila melanogaster





Oral Presentation: Shao‐Wei Huang

ON THE RELATIONSHIP BETWEEN THE PROTEIN STRUCTURE AND PROTEIN DYNAMICS

Chih-Hao Lu1 Shao-Wei Huang1

Yan-Long Lai1 Chih-Peng Lin1

Chien-Hua Shih1 Cuen-Chao Huang1

Jenn-Kang Hwang*1

1 Institute of Bioinformatics, National Chiao Tung University, HsinChu 30050, Taiwan, Republic of China

Recently, we have developed a method (Shih et al., Proteins: Structure, Function, and Bioinformatics, Proteins. 2007) to compute correlation of fluctuations. This method, referred to as the protein fixed-point (PFP) model, is based on the positional vectors of atoms issuing from the fixed point, which is the point of the least fluctuations in proteins. One corollary from this model is that atoms lying on the same shell centered at the fixed point will have the same thermal fluctuations. In practice, this model provides a convenient way to compute the average dynamical properties of proteins directly from the geometrical shapes of proteins without the need of any mechanical models, and hence no trajectory integration or sophisticated matrix operations are needed. As a result, it is more efficient than molecular dynamics simulation or normal mode analysis. Though in the previous study the PFP model has been successfully applied to a number of proteins of various folds, it is not clear to what extent this model will be applied. In this article, we have carried out the comprehensive analysis of the PFP model for a dataset comprising 972 high-resolution X-ray structures with pair wise

�sequence identity 0.25%. We found that in most cases the PFP model works well. However, in case of proteins comprising multiple domains, each domain should be treated separately as an independent dynamical module with its own fixed point; and in case of the protein complex comprising a number of subunits, if functioning as a biological unit, the whole complex should be considered as one single dynamical module with one fixed point. Under such considerations, the re-sultant correlation coefficient between the computed and the X-ray structural B-factors for the data set is 0.59 and 75% (727/972) of proteins with a correlation coefficient 0.5. Our result shows that the fixed-point model is indeed quite general and will be a useful tool for high throughput analysis of dynamical properties of proteins.

Keywords: protein dynamics; thermal fluctuations; molecular dynamics; normal mode analysis; B-factors


Oral Presentation: Allan Lo

THE INS AND OUTS OF MEMBRANE PROTEIN TOPOLOGY PREDICTION

Allan Lo 1,2 Hua-Sheng Chiu 3

Ting-Yi Sung 3 Ping-Chiang Lyu 2

Wen-Lian Hsu1 3 1 Bioinformatics Program, Taiwan International Graduate Program, Academia Sinica, Taipei, Taiwan 2 Department of Life Sciences, National Tsing Hua University, Hsinchu, Taiwan 3 Bioinformatics Lab., Institute of Information Science, Academia Sinica, Taipei, Taiwan

Integral membrane proteins (IMPs) play a crucial role in many cellular processes

and perform a variety of life-critical functions. Knowledge of their high-resolution

structures can lead to a better understanding of how these proteins function;

however, the progress is hampered by the experimental difficulties. Therefore,

computational methods are important for elucidating the structural genomics of

IMPs. The prediction of transmembrane (TM) helix and topology provides

important information about the structure and function of a membrane protein. We

have developed SVMtop, a hierarchical classification method using support vector

machines (SVMs). The prediction framework integrates selected biological features

that capture the sequence-to-structure relationship and a new topology scoring

function based on membrane protein folding. Our method is evaluated on low- and

high-resolution data sets with cross-validation, and the topology (sidedness)

prediction accuracy reaches as high as 90%. In addition, very low overall false

positive (0.5%) and false negative rates (0~1.2%) are achieved for the

discrimination between soluble and membrane proteins. Compared with the

‘positive-inside rule’, which has topology prediction accuracy of 73~76%, SVMtop

improves the prediction by as much as 16%. Lastly, the analysis of the topology

scoring function suggests that the topogeneses of single- and multi-spanning TM

proteins have different levels of complexity and the consideration of inter-loop

topogenic interactions for the latter is the key to achieving better predictions. This

method can facilitate the annotation of membrane proteomes to extract useful

structural and functional information. A web server has been constructed and it is

publicly available at http://bio-cluster.iis.sinica.edu.tw/~bioapp/SVMtop.


Poster Session Jian‐Hong Ou Minoru Honda Hiroshi Ito Tatsuaki Nakahara Nelson Hayes Mikihiko Kawai Chien‐Hua Shih Yan‐Long Lai Hsiu‐Yu Wang Wei‐Yao Chou Shih‐Chi Peng Chia‐Han Chu Boris Stitnicky Hua‐Sheng Chiu Jia‐Ming Chang Meng‐Ru Ho Ya‐Chi Lin Sung‐Chou Li Kazuhiro Fujita


Poster: Jian‐Hong Ou

ANALYSIS OF DYNAMICS OF GENE EXPRESSION BY VISULAIZATION METHOD WITH GFP IN LYSINE BIOSYNTHESIS

Jianhong Ou1 Tadashi Yamada2


Keisuke Nagahisa2 Takashi Hirasawa2 [email protected] [email protected]

Chikara Furusawa2,3 Testuya Yomo2,3,4


Hiroshi Shimizu*2 [email protected]

1 Department of Biotechnology, Graduate School of Engineering, Osaka University, 2-1

Yamadaoka, Suita, Osaka 565-0871, Japan 2 Department of Bioinformatic Engineering, Graduate School of Information Science and

Technology, Osaka University, 2-1 Yamadaoka, Suita, Osaka 565-0871, Japan 3 Complex Systems Biology Project, ERATO, JST 4 Graduate School of Frontier Biosciences, Osaka University, 2-1 Yamadaoka,

Suita, Osaka 565-0871, Japan

We investigated the expression dynamics of genes involved in lysine biosynthesis in Escherichia coli cells to obtain a quantitative understanding of the gene regulatory system. By constructing reporter strains expressing the green fluorescence protein (gfp) gene under the control of the promoter regions of those genes associated with lysine biosynthesis, time-dependent changes in gene expression in response to changes in lysine concentration in the medium were monitored by flow cytometry. Five promoters involved in lysine biosynthesis respond to the changes in lysine concentration in the medium. For these five promoters, time-dependent gene expression data were fitted to a simple dynamical model of gene expression to estimate the parameters of the gene regulatory system. According to the fitting parameters, dapD shows a significantly larger coefficient of repression than the other genes in the lysine synthesis pathway, which indicates the weak binding activity of the repressor to the dapD promoter region. Moreover, there is a trend that the closer an enzyme is to the start of the lysine biosynthesis pathway, the smaller its maximal promoter activity is. The results provide a better quantitative understanding of the expression dynamics in the lysine biosynthesis pathway. Keywords: Lysine Biosynthesis; Promoter Activation; System Dynamics; Escherichia coli





Poster: Minoru Honda

ENCODING OF VISUAL INFORMATION INTO NEURONAL NETWORKS BY SPIKE-TIMING DEPENDENT SYNAPTIC PLASTICITY

Minoru Honda1 Hidetoshi Urakubo2


Shinya Kuroda2 [email protected]

1 Department of Computational Biology, Graduate School of Frontier Sciences,

The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, Japan 2 CREST, Japan Science and Technology Agency, Department of Biophysics and Biochemistry,

Graduate School of Science, University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo, 113-0033, Japan

Spike-timing dependent synaptic plasticity (STDP) plays important roles in development of neuronal networks. We have already developed a detailed kinetic simulation model, which is based on molecular mechanisms and reproduces STDP. In this study, we extracted an essence of STDP by reducing the detailed STDP model into a simple STDP model. Using this simple STDP model, we studied the role of STDP in activity-dependent restructuring neuronal networks in retinotectal systems, and analyzed a learning process of direction selectivity. We demonstrated the systems biological analysis of learning and memory from molecular to neuronal networks levels.

Keywords: systems biology, synaptic plasticity, STDP, direction selectivity, retinotectal system





Poster: Hiroshi Ito

ORIGINE OF THE RESILIENCE OF CYANOBACTERIAL CIRCADIAN CLOCK

Hiroshi Ito1 Hakuto Kageyama2


Michinori Mutsuda1

[email protected]

1 Division of Biological Science, Graduate School of Science, Nagoya University, Furo-cho 1, Chikusa-ku, Nagoya 464-8602, Japan

2 Department of Biophysics and Biochemistry, Graduate School of Science, The University of Tokyo, Tokyo 113-0033, Japan

3 Laboratory for Systems Biology, Center for Developmental Biology, RIKEN, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan

The cyanobacterial circadian oscillator can be reconstituted in vitro by mixing three purified clock proteins, KaiA, KaiB and KaiC, with ATP. The KaiC phosphorylation rhythm persists for at least 10 days without damping. By mixing oscillatory samples that have different phases and analyzing the dynamics of their phase relationships, we found that the robustness of the KaiC phosphorylation rhythm arises from the rapid synchronization of the phosphorylation state and reaction direction (phosphorylation or dephosphorylation) of KaiC p �roteins. We further demonstrate that synchronization is tightly linked with KaiC dephosphorylation and is mediated by monomer exchange between KaiC hexamers.

Keywords: Circadian rhythm, Cyanobacteria, Synchronization of KaiC phosphorylation rhythm




Poster: Tatsuaki Nakahara

ACTSCAN: AN IMAGE ANALYSIS TOOL FOR AUTOMATIC CLASSIFICATION OF MOUSE BODY FAT

Tatsuaki Nakahara1 Akihiro Nakaya1


1 Department of Computational Biology, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwashi, Chiba 277-8561, Japan

ActScan is an implementation of a newly developed method that extracts information of the spatial distribution of two types of body fat in a mouse body. Integrating a series of images obtained by scanning a mouse body using a noninvasive imaging technique such as computed tomography (CT), the method reconstructs morphological shapes of visceral fat and subcutaneous fat in a mouse. With respect to each CT image, pixels that correspond to these two types of body fat are divided by a boundary that is defined by the peritoneum. Our method classifies those pixels into two categories according to whether they are inside the boundary or not. To carry out the classification, the method extracts the boundary in the CT images using techniques called active contour and active tube. By using a real dataset of model animals, we accessed the efficiency of the method. Classification results of 90% or more of pixels by the method coincided with those by manual processing, showing that our method is comparable to “experts”. ActScan can extract two types of body fat as quantitative traits in a systematic manner and provide clues to find the genetic factors that control the traits. Keywords: Image Processing; CT; Mouse; Visceral Fat; Subcutaneous Fat.





Poster: Nelson Hayes

TOOLS AND RESOURCES IN THE VARDB ANTIGENIC VARIATION SEQUENCE DATABASE

Nelson Hayes1 Diego Diez1


Minoru Kanehisa1 Susumu Goto1 [email protected] [email protected]

1 Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011

Japan

Emerging infectious diseases and drug resistant strains of current pathogens are a threat to public health, but few comprehensive resources exist to study the underlying problem of antigenic variation. Antigenic variation is the mechanism by which pathogens alter their antigenic signature in order to evade recognition and clearance by the immune system. The varDB database (http://www.vardb.org/) is a curated collection of protein and nucleotide sequences involved in antigenic variation in pathogens. The database provides a compilation of annotated genome sequence data on antigenic variation strategies as well as integrated tools to facilitate comparative analysis within and among taxa. The aim of varDB is to serve as a central site for antigenic protein families from multiple pathogens, enabling researchers to compare pathogenic mechanisms across taxa with the goal of identifying common mechanisms of pathogenicity to assist in the fight against a range of important diseases. Keywords: antigenic variation, genomics, infectious diseases, PfEMP1





Poster: Mikihiko Kawai CHARACTERIZATION OF LINKAGE BETWEEN RESTRICTION MODI F

ICATION GENES WITH GENOME POLYMORPHISM

Mikihiko Kawai 1 Iichizo Kobayashi 1,2


1 Graduate School of Frontier Science,University of Tokyo,4-6-1 Shirokanedai, Minato-ku,Tokyo 108-8639, Japan

2 Institute of Medical Science, University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan

Restriction–modification genes consist of two enzymatic activities—namely, a Restriction enzyme that recognizes a specific DNA sequence and introduces a double-strand break, and a cognate modification enzyme that can methylate the same sequence and thereby render it resistant to the restriction enzyme. Their genes are often tightly linked and form a restriction–modification gene complex. Because of the importance on biotechnology and molecular biology, restriction endonuclease genes are widely characterized experimentally since the first discovery of restriction endonuclease gene. Sequence diversity among restriction endonucleases is larger, and number of genes that can be detected by simple homology search such as PSI-BLAST are rather limited. So far, restriction enzymes have been found to belong to five evolutionarily unrelated superfamilies with different folds. However, for the majority of restriction enzyme sequences, the structure has been neither determined nor predicted, and it remains to be determined if the so far ‘unassigned’ sequences belong to one of those five superfamilies or to some other fold, known or unknown. Genome comparison is a powerful approach for the identification of new restriction enzymes and accompanying modification enzymes because restriction–modification gene complexes are frequently found at the polymorphic site when the sequence is compared with a sequence that shows similarity with the former. Here, we surveyed how many of homologue groups of restriction endonuclease genes show such polymorphism. We used known restriction-modification genes, both experimentally characterized and found by homology search as a positive data set. Nucleotide sequences of regions that code these restriction modification genes are compared with homologous sequences, and polymorphisms are surveyed. The result supported the frequent linkage of restriction modification genes with genome polymorphisms. Based on the result, we could found several groups of candidate restriction endonuclease genes that are previously unassigned as such using the genome comparison method. This approach does not stand on the homology with known genes, and good way to find new genes that are unrelated to known ones.


Poster: Chien‐Hua Shih

A SIMPLE WAY TO COMPUTE PROTEIN DYNAMICS WITHOUT A MECHANICAL MODEL

Chien-Hua Shih1 Shao-Wei Huang1


Shih-Chung Yen1 Yan-Long Lai1


Sung-Huan Yu1 Jenn-Kang Hwang*1

[email protected] [email protected].

1 Institute of Bioinformatics, National Chiao Tung University, HsinChu 30050, Taiwan, Republic of

China

We found that in proteins the average atomic fluctuation is linearly related to the square of the atomic distance from the center of mass of the protein. Using this simple relation, we can accurately compute the temperature factors of proteins of a wide range of sizes and folds, and the correlation of the fluctuations in proteins. This simple relation provides a direct link between protein dynamics and the static protein’s geometrical shape and offers a simple way to compute protein dynamics without either long time trajectory integration or any matrix operations.

Keywords: protein dynamics; thermal fluctuations; molecular dynamics; normal mode analysis




Poster: Yan‐Long Lai

PKNOT: THE PROTEIN KNOT WEB SERVER

Yan-Long Lai1 Shih-Chung Yen1

Sung-Huan Yu1 Jenn-Kang Hwang1,2

1 Institute of Bioinformatics, National Chiao Tung University, Hsinchu, Taiwan 2 Core Facility for Structural Bioinformatics, National Chiao Tung University, Hsinchu, Taiwan

Knotted proteins are more commonly observed in recent years due to the enormously growing number of structures in the Protein Data Bank (PDB). Studies show that the knot regions contribute to both ligand binding and enzyme activity in proteins such as the chromophore-binding domain of phytochrome, ketol–acid reductoisomerase or SpoU methyltransferase. However, there are still many misidentified knots published in the literature due to the absence of a convenient web tool available to the general biologists. Here, we present the first web server to detect the knots in proteins as well as provide information on knotted proteins in PDB—the protein KNOT (pKNOT) web server. In pKNOT, users can either input PDB ID or upload protein coordinates in the PDB format. The pKNOT web server will detect the knots in the protein using the Taylor's smoothing algorithm. All the detected knots can be visually inspected using a Java-based 3D graphics viewer. We believe that the pKNOT web server will be useful to both biologists in general and structural biologists in particular.






Poster: Hsiu‐Yu Wang

IN SILICO AND IN VITRO ANALYSES OF EVOLUTIONARY- CONSERVED SEQUENCE MOTIF IN PRIMATE GENOMES OF EOSINOPHIL

RIBONUCLEASES

Hsiu-Yu Wang1 Margaret Dah-Tsyr Chang1

1 Institute of Molecular and Cellular Biology & Department of Life Science,National Tsing Hua University, Hsinchu 30013, Taiwan, R. O. C

The genes encoding human eosinophil-derived neurotoxin (edn) and eosinophil cationic protein (ecp) are members of a subfamily of primate ribonuclease (rnase) genes. Although they are generated by gene duplication event, distinct edn and ecp expression profiles in various tissues have been reported. In this study, we obtained the upstream promoter sequences of several representative primate eosinophil rnases, and bioinformatic analysis revealed the presence of a shared 34-nucleotide (nt) sequence stretch located at -81 to -48 in all edn promoters and macaque ecp promoter. Such a unique sequence motif constituted a region essential for transactivation of human edn in hepatocellular carcinoma cells. Therefore, softwares TESS and CONSITE were further used to predict transcription factor(s) involved in the 34-nt segment/protein complex. Both tools revealed that transcription factors AP2, MAZ, Sp1, and LF-A1 might bind to the 34-nt segment. Gel electrophoretic mobility shift (EMSA) assay, transient transfection and scanning mutagenesis experiments allowed us to identify binding sites for two transcription factors, Myc-associated zinc finger protein (MAZ) and SV-40 protein-1 (Sp1), within the 34-nt segment. Our results provide the first direct evidence that MAZ and Sp1 play important roles on the transcriptional activation of human edn promoter through specific binding to a 34-nt segment, a highly conserved sequence present in representative primate eosinophil rnase promoters through evolution


Poster: Wei‐Yao Chou MULTIPLE ANCHORED BLOCK SEQUENCE ALIGNMENT BASED ON SHORT

CONSENSUS MOTIFS

Wei-Yao Chou 1 Margaret Dah-Tsyr Chang 2

Hao-Teng Chang 2 Wei-I Chou 2

Tan-chi Fan 2 Wen-Shyong Tzou 3

Tun-Wen Pai*4

1 Department of Computer Sciences 2 Institute of Molecular and Cellular Biology & Department of Life Science, National Tsing Hua University, Hsinchu, Taiwan

3 Institute of Bioscience and BioTechnology 4 Department of Computer Science & Center for Marine Bioscience and Biotechnology, National

Taiwan Ocean University, Keelung 20224, Taiwan

Multiple sequence alignment is one of the most important problems in computational biology. Unfortunately, it has been proven to be an NP-hard problem which is extremely time-consuming so most researchers have been tending to find heuristics and approximations. From observations on biological experiments, conserved functional motifs can be represented as similar motif patterns. Therefore, we apply a two-phase methodology of a motif finding phase and an alignment phase to produce anchored sequence alignment. Firstly, we apply an efficient algorithm modified from Ladder-like Interval Jumping Searching Algorithm (LIJSA) to identify short similar patterns. Secondly, those identified patterns are treated as anchored blocks with the highest priority to be aligned. Subsequently the remaining unidentified residues are aligned in single-residue manner. Our characteristic is that the representative motifs are located in the first phase and are aligned in the second phase. For biological analysis, 6 members of human RNase A-like superfamily with known structures were tested. The results show that our novel method not only identifies the functional regions at the primary structure level by itself, but also provides better signature alignment at the tertiary structure level in conjunction with other structure alignment tool. In addition, analysis of BAliBASE, a well-established sequence alignment benchmark indicates that our program produces superior results on column score criterion.


Poster: Shih‐Chi Peng

A STRATEGY OF FORWARD AND REVERSE ENGINEERING LINKING THE SIGNALING NETWORK AND GENOMICS REGULATORY RESPONSES

Shih-Chi Peng 1 David Shan-Hill Wong 2

Yung-Jen Chuang 3,4 Chao-Cheng Chen 2

Chuan-Yi Tang 1

1Department of Computer Sciences 2Department of Chemical Engineering 3 Department of Life Sciences 4Institute of Bioinformatics and Structural Biology, National Tsing Hua University, Hsinchu 30013,

Taiwan

In this study, the systematic approach was proposed for linking signal transduction cascade and gene regulatory network. Diverse cellular stimuli would make a variation of transcription factor activities through signal transduction pathways, result in different gene expressions. Here we employed forward engineering and reverse engineering to construct signaling pathway and gene regulatory network respectively. Then the two networks were linked by transcription factor activity profiles. As we demonstrated, the activity of host NF-κB could be reconstructed from gene expression microarray data, and were further compared with the results of simulation on dynamics of NF-κB signaling pathway. Accordingly, the relations between signal and gene expressions would be inferred indirectly. This methodology provides a new insight for systems biology modeling from protein activities and gene expressions.


Poster: Chia‐Han Chu

NOVEL APPROACH FOR PROTEIN STRUCTURE COMPARISON BASED ON MUTUAL SECONDARY STRUCTURAL ELEMENTS

Chia-Han Chu 1 Chuan Yi Tang 1

Tun-Wen Pai 2 Cheng-Yin Tang 2

1 Department of Computer Sciences, National Tsing Hua University, Hsinchu 30013, Taiwan, 2 Department of Computer Science and Engineering & Center for Marine Bioscience and

Biotechnology, National Taiwan Ocean University, Keelung 20224, Taiwan

Protein Structure Comparison (PSC) and Classification have been utilized to comprehend evolutionary relationship between structures and functions. However, PSC is computationally time-consuming due to complexity of spatial organizations. In order to reduce the computational complexity, we developed a novel PSC method by transforming a three-dimension structure into a two-dimensional angle-distance image. Therefore, the PSC problem can be formulated as an image template matching problem.

Firstly, we encode angle-distance images by utilizing secondary structure information of each protein. According to combination of the secondary structure elements (SSEs), the mutual SSEs pairs are decomposed into three different types of images for each protein structure. Subsequently, any two protein structure pairs are compared by using cross-correlation and co-existing approaches to identify the similarity of various patterns. According to the comparison results, the relationship of structural evolution is displayed by hierarchical clustering method, and the silhouette algorithm is employed to validate the clustering results.

Two standard testing datasets including Chew-Kedem and Skolnick datasets were employed to verify our method, which appeared to outperform the other existing methods.


Poster: Boris Stitnicky

COMPUTATIONAL SIMULATION OF THE THYMIDINE TRIPHOSPHATE LEVELS THROUGHOUT THE CELL CYCLE OF MAMMALIAN CELLS

Boris Stitnicky 1 Ueng-Chang Yang 1,2

1 The Bioinformatics Program, Taiwan International Graduate Program, Academia Sinica, No. 128,

Section 2, Academia Road, Nankang, Taipei, Taiwan 115, R.O.C. 2 The Institute of Biomedical Informatics, National Yang Ming University, No. 155, Sec. 2, Li-Nong St.,

Taipei, Taiwan 112, R.O.C.

Level of thymidine triphosphate (TTP) in mammalian cells physiologically varies 20-fold between S phase cells and G0 arrested cells. Main enzymes controlling the TTP levels, such as cytoplasmic thymidine kinase (TK1) and thymidine monophosphate kinase (TMPK) are expressed at the beginning of S phase and degraded in a specific fashion after the mitosis. While sufficient amount of TTP is needed for successful DNA duplication in the S phase, excessive TTP levels disrupt the feedback control of the cellular deoxynucleotide pathway ultimately causing slowed growth and decreased DNA duplication accuracy. Defects in the TTP pathway can thus act as a first step shifting the cell down the spiral of accelerating mutagenesis. Our collaborating lab has elucidated the degradation mechanisms of TK1 and TMPK enzymes, but t! heir influence on the TTP level remains speculative. We simulate the TTP pathway to show in an exact manner how the changed enzyme levels translate into the changed TTP level. There are several published models related to nucleotide metabolism, but as far as we know, none of them addresses cytoplasmic pyrimidine nucleotides in eucaryotes. Our current model includes the influx of TTP via de novo pathway and salvage pathway, and efflux of TTP used for DNA synthesis.


Poster: Hua‐Sheng Chiu SVMTOP2: A COMPREHENSIVE WEB SERVER FOR PREDICTING THE

TOPOLOGY OF ALPHA-HELICAL MEMBRANE PROTEINS Hua-Sheng Chiu Huang-Yao Cheng

Allan Lo Ting-Yi Sung

Wen-Lian Hsu

1 Bioinformatics Lab., Institute of Information Science, Academia Sinica, Taipei, Taiwan

Integral membrane proteins (IMPs) play a crucial role in several cellular processes and perform a variety of life-critical functions. The knowledge of their high-resolution structures can lead to a better understanding of how these proteins function; however, the progress is hampered by the experimental difficulties. Therefore, computational methods are important for elucidating the structural genomics of IMPs. Previously, we proposed a method called SVMtop, to identify the helical domains and topology in transmembrane (TM) proteins (Lo et al., 2008). A simple web-based application was established and provided prediction results in plain texts, similar to the majority of other web servers. Since it is hard to interpret or analyze prediction results in text mode, we are motivated to develop a new comprehensive web server, SVMtop2. Similar to SVMtop, it adopts a hierarchical classification scheme in which TM helix and loop (inside or outside) residues are predicted in two consecutive stages. A novel topology scoring function called AGSF is then applied to determine the final topology of the query protein. Most notably, SVMtop2 provides various useful analyses and visualizations of a query protein and its prediction results. First, various physico-chemical properties and amino acid composition of the target protein are calculated. Comprehensive data analysis of its prediction results is also performed. Second, the results of the above analyses are displayed by advanced visualizations, including linear, snake-like, and helical wheel plots, enabling users to formulate better interpretation of the prediction results. Third, hydropathy plots based on two conventional hydrophobicity scales are included for effective comparison with prediction results. Lastly, SignalP 3.0, a signal peptide detection method, is incorporated to enhance discrimination of signal peptides. We have evaluated the performance of SVMtop2 on a data set of 258 TM proteins (not involved in 10-fold cross validation). High accuracies were obtained for helix (72%) and topology (82%) predictions which compared favorably with many existing methods. SVMtop2 not only takes advantage of the robustness of the original method, but also provides a comprehensive output report including relevant data analyses, informative visualizations, and signal peptide detection. Server site: http://bio-cluster.iis.sinica.edu.tw/~bioapp/SVMtop2


Poster: Jia‐Ming Chang A DOCUMENT CLASSIFICATION STRATEGY TO PREDICT

PROTEIN SUBCELLULAR LOCALIZATION USING SEQUENCE MOTIFS AND EVOLUTIONARY INFORMATION

Jia-Ming Chang Emily Chia-Yu Su

Allan Lo Hua-Sheng Chiu

Ting-Yi Sung Wen-Lian Hsu

1 Bioinformatics Lab., Institute of Information Science, Academia Sinica, Taipei, Taiwan

Protein subcellular localization is important for genome annotation, protein function prediction, and drug discovery. However, determination of subcellular localization using experimental approaches is time-consuming; thus, efficient prediction using computational approaches becomes highly desirable. We present a prediction method, PSLDoc (Protein Subcellular Localization prediction based on Document classification), which incorporates a probabilistic latent semantic analysis (PLSA) with a one-versus-rest support vector machine (SVM) model based on document classification techniques for both prokaryotes and eukaryotes. Our method extracts biological features from gapped-dipeptides of various distance, where evolutionary information from the position specific score matrix is utilized to determine the weighting of each gapped-dipeptide. Then, the features are further reduced by PLSA and incorporated as input vectors for SVM classifiers. The accuracy of PSLDoc reaches 93.0% for Gram-negative bacteria proteins and 81.7% for human proteins in a five-fold cross-validation com-pared to previous results of 91.2% and 78.0%, respectively. Experiment results show that feature selection and reduction by document classification techniques can lead to a significant improvement in the prediction performance. Moreover, we demonstrate that PLSA automatically selects discriminating sequence motifs and greatly reduces the feature dimension without sacrificing the prediction accuracy. Most notably, compared to similar approaches based on motif co-occurrences, PSLDoc achieves a much higher coverage because it starts with the examination of dipeptides and also considers the col-location of higher-order sequence motifs by PLSA feature transformation. Because of the generality of this method, it can be extended to more species or multiple localization sites in the future.


Keywords: template file; MS WORD; Genome Informatics, World Scientific Publishing


Poster: Meng‐Ru Ho

IDENTIFY EUKARYOTIC ORTHOLOGY VIA TRANSCRIPTION UNITS

Meng-Ru Ho 1,2,3 Wen-Jung Jang 2,3,4

Chun-houh Chen 5 Lan-Yang Ch'ang 3

Wen-chang Lin 1,3

1 Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan 2 Bioinformatics Program, Taiwan International Graduate Program, Academia Sinica, Taipei, Taiwan 3 Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan 4 Institute of Bioinformatics, National Chiao-Tung University, Hsinchu, Taiwan 5 Institute of Statistical Sciences, Academia Sinica, Taipei, Taiwan

Orthology is a widely used concept in comparative and evolutionary genomics. In addition to prokaryotic orthology, delineating eukaryotic orthology has provided insight into the evolution of higher organisms. Indeed, many eukaryotic ortholog databases have been established for this purpose. However, one crucial difference between prokaryotic and eukaryotic genomes, alternative splicing, has hampered eukaryotic orthology assignment. Therefore, existing databases might contain ambiguous eukaryotic ortholog relationships and misclassify alternatively spliced protein isoforms as in-paralogs, which are duplicated genes after speciation.

Here, we propose a new approach for designating eukaryotic orthology using transcription units, and take human and mouse as a prototype. While less than 69% orthology of human and mouse are identified by others, our program could delineated up to 11% improvement. Moreover, our ortholog database provides more than 92% consistency among existing databases. In addition to managing alternative splicing, our approach is capable of identifying orthologs of embedded genes and fusion genes. The orthologs of embedded genes and fusion genes were fulfilled by using syntenic evidence. In summary, this new approach is sensitive, specific, and able to generate a comprehensive and accurate compilation of eukaryotic orthologs.


Poster: Ya‐Chi Lin

A B-CELL LINEAR EPITOPE PREDICTION SYSTEM FOR LOW ANTIGENTICITY PEPTIDES

Ya-Chi Lin Szu-Wen Wang

Wei-Kuo Wu Hsin-Wei Wang

Tun-Wen Pai

1 Dept. of Computer Science and Engineering, National Taiwan Ocean University, Keelung,

Taiwan 20224, Republic of China

An epitope is defined as part of local sites on the surface of an antigen that can elicit an immune response. It can be recognized and combined with a specific antibody to counter that response. Therefore, the epitope prediction is one the most important applicationsn for molecular immunology. There are several existing B-cell linear epitope prediction systems which are basically designed based on the phsico-chemical propensity scale analysis, training known experimental epitopes, or combine both techniques to predict possible epitopes from primary sequence information. To verify the accuracy of different prediction techniques, several well-know epitope databases are suggested to measure the system performance. Previous studies revealed that linear epitopes could possibly appear in protein segments with low-to-moderate global antigenicity but relatively high local antigenicity scales. Hence, in this study, a method adopting mathematical morphology is developed to extract local peaks from a linear combination of the propensity scales of physico-chemical characteristics at each antigen residue. We will show that not only segments with global high scales should be regarded as possible linear epitopes, but those with relatively high scales in local regions should also be considered as possible candidates. To extract such local peaks from propensity scales, we have proposed an effective and efficeint method based on mathematical morphology. Based on the combination of erosion, dilation, opening, and closing operations, segments with relatively higher antigenicity than neighbouring areas can be extracted from the antigenic distribution of a protein sequence. All the extracted segments from our developed system will be verified by standard databases and compared with the well-known BepiPred system.


Poster: Sung‐Chou Li

DISCOVERY OF METAZOAN MICRORNAS WITH EVOLUTION CONSERVATION

Sung-Chou Li 1,2,3 Wen-Ching Chan 1,2,4

Ling-Yueh Hu 3 Chun-Nan Hsu 4

Wen-Chang Lin 1,3

1 Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan 2 Bioinformatics Program, Taiwan International Graduate Program, Academia Sinica, Taipei, Taiwan 3 Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan 4 Institute of Information Sciences, Academia Sinica, Taipei, Taiwan

MicroRNAs (miRNAs) are endogenous non-protein-coding RNAs that are approximately 22-nucleotide long in length. Since the first discovery of miRNAs in Caenorhabditis elegans, thousands of miRNAs have been computationally and/or experimentally identified in many organisms, including mammals, invertebrates, insects, plants and viruses, which suggests that miRNA genes are conserved during evolution and widely distributed among all species. In this study, we modified our previous pre-miRNA discovery pipelines to predict miRNAs in 50 animal genomes. Using Support Vector Machine as a classifier, we identified additional 17,479 orthologous or paralogous pre-miRNAs, as well as their corresponding mature miRNAs, with 89.5% sensitivity and 97.37% specificity. Our results suggest that miRNA genes are widely distributed in many animal species, including Schmidtea, nematode, insect, urchin, sea squirt and vertebrates. Different miRNA families have distinct distribution patterns among these species, which could provide insight into the evolution of miRNAs and their functional significance in development and organogenesis.


Poster: Kazuhiro Fujita

DYNAMICS OF AKT PATHWAY IN PC12 CELLS

Kazuhiro Fujita1 Shinya Kuroda2


1 Department of Computational Biology, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8568, Japan

2 Department of Biochemistry and Biophysics, Graduate School of Science, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan

Abstract: Akt pathway plays an essential role in many biological processes such as cell growth and differentiation. We investigated dynamic characteristics of the Akt pathways in PC12 cells, where the cells undergo into cell proliferation and differentiation, both of which requires cell growth, in response to EGF (Epidermal Growth Factor) and NGF (Nerve Growth Factor), respectively. To understand the dynamics of the Akt pathways, we measured activity of molecules in the Akt pathway, such as Akt and its downstream molecules, S6, in response to the growth factors. We found that NGF induced transient and sustained phosphorylation of these molecules in a dose-dependent manner. In contrast, EGF induced transient Akt phosphorylation in a dose-dependent manner; however, phosphorylation of S6 appeared to be biphasic. To understand the distinct responses of phosphorylation of S6, we developed the simple computational model of the Akt pathways. This model reproduced the Akt and S6 phosphorylation in response to EGF and NGF. Using this model, we are currently exploring the mechanism of the distinct phosphorylation of S6 in response to EGF and NGF. Keywords: systems biology; signal transduction; Akt; S6; receptor kinetics; EGF; NGF; PC12





MEMO

Documents

Hsinchu Taiwan March 9-13, 2008 Program and …ibm4.life.nthu.edu.tw/jtws/brochure.pdf7 日本－台灣 Japan‐Taiwan Conference on Computational and Systems Biology March 10th, Monday