Célia Ghedini Ralha - CIC/UnBghedini/resources/LeipzigPresentation-14Dez12.pdf · Why to use multi-agent approach? adequate/direct way to implement agent-based models (ABM) agent-based

Célia Ghedini Ralha Department of Computer Science

University of Brasília [email protected] - www.cic.unb.br/~ghedini/

Agenda

  Research Project Overview   Problem   Hypothesis   Results   BioAgents

  Actual Chalenge   NcRNA classification   Methodology   ncRNA-Agents - Architecture & Prototype

  Conclusions & Future Work   Brazilian Program Science Without Borders

Personal Introduction

  PhD thesis (1993-96) ◦  Prof. Anthony G. Cohn, Professor of Automated Reasoning, University of Leeds, England ◦  QSR Group - Applied Region Connection Calculus to Web information ◦  Title: A Framework for Dynamic Structuring of Information

  Academic career (2002) ◦  Associate Professor - Intelligent Information Systems Computer Science Department - University of Brasília, Brazil ◦  Research Group Leader: InfoKnow - Computer Systems for Information and Knowledge

Treatment - Registered Brazil’s National Council for Scientific & Technological Development (CNPq) http://dgp.cnpq.br/buscaoperacional/detalhegrupo.jsp?grupo=024010360AHR2C

◦  member of the Communication Network Laboratory (COMNET) (http://comnet.cic.unb.br/)

◦  research focus – information & knowledge treatment, agent-based modeling, agent simulation

  Senior Stage (Sept 2012-Marc 2013), Capes funding ◦  Prof. Gerd Wagner, Chair of Internet Technology Brandenburgische Technische

Universität in Cottbus, Germany ◦  Agent-Object-Relationship (AOR) Modeling Language ◦  scientific director Simurena - startup company specialized in web-based simulation and

games

Research Problem

  Challenge – the enormous volumes of DNA & RNA sequences of organisms continuously being discovered by genome projects around the world

  Annotation - a key activity: 1.  automatically executed; 2.  manual done by biologists

  use the results of the automatic annotation   their knowledge and experience in order to predict the function to each DNA sequence

The Hypothesis

  the definition and implementation of an annotation system based on multi-agent approach can help during the complex annotation process

Why to use multi-agent approach?

  adequate/direct way to implement agent-based models (ABM)   agent-based modeling is a natural metaphor to represent the

interaction of human agents in real environment   reasoning & knowledge treatment

  ABM is a vital technique for studying Complex Adaptive Systems (CAS)   as evidenced by the growing body of literature spanning disciplines ranging

from Biology, Social Sciences to Computer Science, e.g.   call for papers from the Springer Complex Adaptive Systems Modeling (CASM) inaugural

special issue publish key papers documenting multidisciplinary methods & applications for ABM of CAS (submission due: November 30, 2012)

  papers documenting successful ABM methodologies & application case studies in areas including:   Life sciences such as Ecology, Biology, Biochemistry, Cancer and Epidemiology   Social sciences   Economics   Cloud computing   Multi-agent systems   Verification, validation, and accreditation of agent-based models   Methods for development and analysis of agent-based models

Research Project

Agent-based project (2006) •  Multi-agent System for Manual Annotation on Genome Sequencing Projects - BioAgents

Collaboration • Institute of Biological Sciences – University of Brasília

•  Prof. Marcelo M. Brígido • Bioinformatics Group – Universität Leipzig

•  Prof. Peter Stadler BIOFOCO III - Finep/MCTI (2008-2012)

• Goal: Development of Software for genomic analysis in cooperative and distributed computational environment in the Midwest Region of Brazil (http://www.biofoco.org/biofoco3/ ) • Institutions: UnB, UFG, UFMS, Embrapa • Sub-Project: GENOALGO

Research Project (Cont.)

  Students ◦  Bachelor

  Hugo W. Schneider & Anderson G. Frazzon (Dec 2006) – Multi-agent System Prototype for Manual Annotation in Genome Sequencing Projects (Sanger Sequencer)

  Daniel S. Souza (Nov 2012) – A Multi-agent Tool for Biological Sequence Annotation ◦  Master Degree

  Richardson S. Lima (Jun 2007) – BioAgents: MAS for manual annotation in genome sequencing projects

  Hugo W. Schneider (Dec 2010) – A Reinforcement Learning Method for BioAgents ◦  Registered for PhD

  Wosley C. Arruda (started 2010) & Hugo W. Schneider (started 2012)

  Publications 1.   RALHA, Célia Ghedini; SCHNEIDER, H. W.; WALTER, M. E. T.; BRIGIDO, M. M. A Multi-agent Tool to Annotate

Biological Sequences. ICAART (2) 2011: 226-231. 2.   RALHA, Célia Ghedini; SCHNEIDER, H. W.; WALTER, M. E. T; BAZZAN, A. L. C. Reinforcement Learning Method for

BioAgents. SBRN 2010: 109-114. 3.   RALHA, Célia Ghedini; SCHNEIDER, H. W.; FONSECA, L. O.; WALTER, M. E. T.; BRIGIDO, M. M. Using BioAgents for

Supporting Manual Annotation on Genome Sequencing Projects. BSB 2008: 127-139. 4.   LIMA, R. S.; RALHA, Célia Ghedini; WALTER, M. E. T.; SCHNEIDER, H. W.; PEREIRA, A. G. F.; BRIGIDO, M. M.

BioAgents: Um Sistema Multi-agente para Anotação Manual em Projetos de Seqüênciamento de Genomas. ENIA 2007: 1302-1310.

5.   International Workshop on Genomic Databases (IWGD), 2007 & 2005

BioAgents – 1st Vertion (2006)

  Sanger technology - thousands of sequences (1990) ◦  submission: biologists send sequences to be processed on

computers - graphics transformed into strings ◦  assembly: sequences are assembled to reconstruct original

fragment ◦  annotation:

1.  automatic: computational programs to infer biological functions to each sequence (i.e. BLAST, BLAT), previously stored in public databases (i.e. GenBank, SwissProt, TrEMBL)

2.  manual: biologists guarantee accuracy and correctness to each sequence function - using their knowledge to analyze and correct the function suggested by automatic annotation

Computational Pipeline

Architecture (1st Vertion - 2006)

BioAgents – 1st Version – Study Cases

Paullinia cupana (Guaraná fruit)

Paracoccidioides brasiliensis (Pb fungus)

Anaplasma marginale (rickettsia)

BioAgents – 2nd Version (2010)

  High-throughput technology - millions of sequences ◦  454/Roche: 1/1: 5 millions sequences/run (200/600 bp length) ◦  illumina/Solexa: 15/20 millions sequences/run (30/100 bp) ◦  Solid/ABI: > 2 millions sequences/run (35/75 bp length)

  submission: short sequences generated by automatic sequencers are "ready" to be processed

  mapping: short sequences are mapped into a reference genome, it would be almost impossible to assembly them directly (resequencing)

  assembly: de novo sequencing, or near sequences previously mapped   automatic annotation: done by computers

Pipeline to high-throughput technology

BioAgents – 2nd Version

  New Sequencing Tech. produce billions of bases short time   New bioinformatics challenges - store & analyze data volume   AI can help many techniques - Machine Learning (ML)   Main problems to deal with ML methods annotation scenery: ◦  huge amount of data ◦  lack of examples for training purposes (specificity of each organism) ◦  define: supervised (statistical classification, bayesian), unsupervised

(association rule, clustering), reinforcement learning (RL) ◦  Q-learning (RL), learns an action-value function to give expected

utility of an action in a given state following a fixed policy   Q: S X A ->|R (how to define the set of actions (A) per state (S)   Didn’t try: (i) delayed Q-learning bringing probably approximately correct

learning bounds to Markov decision processes; (ii) learning automata; (iii) temporal difference learning

  Work Objective - Proposes & implements a reinforcement learning method for BioAgents

Architecture (2nd Version - 2010)

2nd Version - Results

Architecture (3rd Version for Web - 2012) Focus of BioAgents in this version is to predict a protein

function to DNA or RNA sequences in genome sequencing projects

The New Challenge

Motivation The classification of non-coding RNA (ncRNA) is a big challenge!! Experts need lots of knowledge & reasoning to classify ncRNA

Problem There are many tools and data bases to help to identify and annotate

ncRNAs, applying different techniques, e.g., BLAST, INFERNAL, tRNAscan-SE, SVM-Portrait, Vienna, NONCODE, RNAdb, miRBase, snoRNA Database, snoRNA for Plants, fRNAdb, Rfam…

But there is no software that can recommend annotation from the results of all these tools together!!

Methodology – three different approaches   Homology ◦  Alignment of Pairs

  snoRNA   RNAdb   NONCODE   mirBASE   Plant - snoRNA

◦  Multiple Alignment   Infernal   tRNAscan-SE

  Class predition ◦  SVM-Portrait (Supervised Learning)

  De novo ◦  Vienna Package

  RNAfold ◦  RNAmmer

Architecture ncRNA-Agents

Prototype

Conclusions & Future Work

What we done so far:   Defined and implemented three versions of BioAgents (annotation, protein)   Studied the new challenge of ncRNA detection to define an agent-based model   Implemented prototype ncRNA-Agents using multi-agent approach

What is missing?   Define the “Seeker”Agent reasoning ◦  WebBlast, snoRNA, miRNA

  Define the conflict resolution mechanism (most challenging part!)   Improve the rationality of agents ◦  other formal logics for MAS (rule-based reasoning) ◦  machine learning ◦  data mining

  Validate ncRNA-Agents with a real genome project (e.g., Pb fungus – wet lab)

We believe Leipzig Bioinformatics Group can help with these new challenge !!!

What is it?   A large scale nationwide scholarship program primarily funded by the Brazilian federal

government to strength and expand the initiatives of science and technology, innovation and competitiveness through international mobility of undergraduate and graduate students and researchers. The program also stimulates the visit of highly qualified young researchers and senior visiting professors to Brazil.

Primary Goal:   Qualify 75 thousand Brazilian students and researchers in top ranked universities worldwide until 2014

Types of Scholarships:   Undergraduate study abroad   Full-Time PhD or PhD internships abroad   Postdoc   Professional Education   Senior Fellowships   Visiting Researchers/Scholars Fields and Sectors of Interest:   Chemistry, Biology, Engineering, Computing & Information Technology, etc…

Célia Ghedini Ralha [email protected]

Documents

Célia Ghedini Ralha - CIC/UnBghedini/resources/LeipzigPresentation-14Dez12.pdf · Why to use multi-agent approach? adequate/direct way to implement agent-based models (ABM) agent-based