58
Introduction The Haplotype Inference Problem The Founder Sequence Reconstruction Problem Boolean Network Design Metaheuristics for Search Problems in Genomics — New Algorithms and Applications — Stefano Benedettini 1 DEIS, Alma Mater Studiorum Università di Bologna, Campus of Cesena, Italy [email protected] Stefano Benedettini Metaheuristics for Search Problems in Genomics

Metaheuristics for Search Problems in Genomics New Algorithms

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Metaheuristics for Search Problems inGenomics

— New Algorithms and Applications —

Stefano Benedettini1

DEIS, Alma Mater Studiorum Università di Bologna, Campus of Cesena, [email protected]

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Outline

1 Introduction

2 The Haplotype Inference Problem

3 The Founder Sequence Reconstruction Problem

4 Boolean Network Design

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Outline

1 Introduction

2 The Haplotype Inference Problem

3 The Founder Sequence Reconstruction Problem

4 Boolean Network Design

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Outline

1 Introduction

2 The Haplotype Inference Problem

3 The Founder Sequence Reconstruction Problem

4 Boolean Network Design

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Outline

1 Introduction

2 The Haplotype Inference Problem

3 The Founder Sequence Reconstruction Problem

4 Boolean Network Design

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Outline

1 Introduction

2 The Haplotype Inference Problem

3 The Founder Sequence Reconstruction Problem

4 Boolean Network Design

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Motivations

Outstanding goals in modern engineering:Develop methodologies and tools to:

synthesise models of biological systemsanalyse real or artificial biological systems

Oftentimes, such systems are complexDivide et impera approach fails to capture importantrelationships

Our objective:Apply automatic procedures to the problem of model design

More precisely, we are interested in model instantiation

These procedures belong to the class of search methods

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Definitions

Model: the set of entities and relationships that:1 Can be used to explain a class of phenomena

or systems2 Can be expressed in a formal language

Model instance: application of aforementioned entities andconcepts to describe a specific system

Predator-Prey model

Model: a system of (parametric) differential equations,such as the Lotka-Volterra equations

Model instance: their applications to describe populationdynamics of rabbits and foxes in a forest

How to instantiate parameters?

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Our Proposed Methodology

Concepts involved:Employ metaheuristic techniques to instantiate models ingenomicsA solution is a model instanceThe search process manipulates (one or more) solutionsA merit factor (objective function) measures the “quality” ofa models/solution with respect to a set of desiderata

Applications:Resolution of biological problemsAutomatic synthesis of biological model instances(Ensemble Approach)

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Metaheuristics for Solving Biological Problems

Facts

Solutions to problems in genomics have to be “realistic”They have to make sense for biologists

Whatever model of a phenomenon is never 100% accurateby definition

A complete technique spends time to return a proof ofoptimality

Motivations

An optimal solution for an approximate model might not beuseful

Metaheuristics can easily incorporate different objectivefunction components

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Metaheuristics for Solving Biological Problems

Facts

Solutions to problems in genomics have to be “realistic”They have to make sense for biologists

Whatever model of a phenomenon is never 100% accurateby definition

A complete technique spends time to return a proof ofoptimality

Motivations

An optimal solution for an approximate model might not beuseful

Metaheuristics can easily incorporate different objectivefunction components

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

A Statistical Approach to Modeling

Ensemble Approach (EA)

Let’s define a feature space F

A generic system σ is identified by a coordinate vectorΦ(σ) = 〈f1, f2, . . . , fn〉 ∈ F

Is a model M an accurate description of a system σ?

The EA says that it is the case if point Φ(σ) is in the samecluster as (different realizations of) model M

Role of Metaheuristics

How to find model instances close to σ in F?

Define a suitable optimization problem and solve it

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

A Statistical Approach to Modeling

Ensemble Approach (EA)

Let’s define a feature space F

A generic system σ is identified by a coordinate vectorΦ(σ) = 〈f1, f2, . . . , fn〉 ∈ F

Is a model M an accurate description of a system σ?

The EA says that it is the case if point Φ(σ) is in the samecluster as (different realizations of) model M

Role of Metaheuristics

How to find model instances close to σ in F?

Define a suitable optimization problem and solve it

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

A Statistical Approach to Modeling

Ensemble Approach (EA)

Let’s define a feature space F

A generic system σ is identified by a coordinate vectorΦ(σ) = 〈f1, f2, . . . , fn〉 ∈ F

Is a model M an accurate description of a system σ?

The EA says that it is the case if point Φ(σ) is in the samecluster as (different realizations of) model M

Role of Metaheuristics

How to find model instances close to σ in F?

Define a suitable optimization problem and solve it

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Outline

1 Introduction

2 The Haplotype Inference Problem

3 The Founder Sequence Reconstruction Problem

4 Boolean Network Design

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Biological Definition

Entities involvedDiploid organism

Haplotype

Genotype

Issues in DNA Sequencing

Haplotype collection is very expensive

On the contrary, genotype collection is not

Haplotype Inference

Obtain haplotype information explaining genotype data

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Biological Definition

Entities involvedDiploid organism

Haplotype

Genotype

Issues in DNA Sequencing

Haplotype collection is very expensive

On the contrary, genotype collection is not

Haplotype Inference

Obtain haplotype information explaining genotype data

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Biological Definition

Entities involvedDiploid organism

Haplotype

Genotype

Issues in DNA Sequencing

Haplotype collection is very expensive

On the contrary, genotype collection is not

Haplotype Inference

Obtain haplotype information explaining genotype data

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Mathematical Definition

Definition (Genotype)

A genotype g is a vector in {0, 1, 2}m

Definition (Haplotype)

An haplotype h is a vector {0, 1}m

Definition (Genotype Resolution)

We say that the haplotypes h, l resolve genotype g, and we write 〈h, l〉⊲g, if and only if:

g[i ] = 0 ⇒ h[i ] = l [i ] = 0g[i ] = 1 ⇒ h[i ] = l [i ] = 1g[i ] = 2 ⇒ h[i ] 6= l [i ]

Definition (Haplotype Inference by Maximum Parsimony)

Let G be a set of genotype; find the minimal set of haplotypes H so that∀g ∈ G, ∃ h, l ∈ H | 〈h, l〉 ⊲ g

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Mathematical Definition

Definition (Genotype)

A genotype g is a vector in {0, 1, 2}m

Definition (Haplotype)

An haplotype h is a vector {0, 1}m

Definition (Genotype Resolution)

We say that the haplotypes h, l resolve genotype g, and we write 〈h, l〉⊲g, if and only if:

g[i ] = 0 ⇒ h[i ] = l [i ] = 0g[i ] = 1 ⇒ h[i ] = l [i ] = 1g[i ] = 2 ⇒ h[i ] 6= l [i ]

Definition (Haplotype Inference by Maximum Parsimony)

Let G be a set of genotype; find the minimal set of haplotypes H so that∀g ∈ G, ∃ h, l ∈ H | 〈h, l〉 ⊲ g

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Mathematical Definition

Definition (Genotype)

A genotype g is a vector in {0, 1, 2}m

Definition (Haplotype)

An haplotype h is a vector {0, 1}m

Definition (Genotype Resolution)

We say that the haplotypes h, l resolve genotype g, and we write 〈h, l〉⊲g, if and only if:

g[i ] = 0 ⇒ h[i ] = l [i ] = 0g[i ] = 1 ⇒ h[i ] = l [i ] = 1g[i ] = 2 ⇒ h[i ] 6= l [i ]

Definition (Haplotype Inference by Maximum Parsimony)

Let G be a set of genotype; find the minimal set of haplotypes H so that∀g ∈ G, ∃ h, l ∈ H | 〈h, l〉 ⊲ g

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

State of the Art

Integer Linear Programming formulationBranch-and-Bound, Branch-and-Cut, . . .

Incomplete techniquesGenetic AlgorithmsTabu Search

Pseudo-Boolean Optimizationrpoly

Complete techniques don’t scale well for large-sizedinstances

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Contributions

Hybrid Ant Colony Optimization algorithmFlexible:

Can accommodate different genetic models by changingthe objective functionCan integrate different resolution criteria, e.g., statisticaltechniques

Effective:Comparable performance to state of the art exact solverrpoly

Scalable:Can cope with large instance sizeSuperior to rpoly in this cases

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Experimental Results

40 60 80 100 120 140 160 18040

60

80

100

120

140

160

180

10 20 30 40 50 60 700

20

40

60

80

100

120

140

80 100 120 140 160 18080

100

120

140

160

180

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Outline

1 Introduction

2 The Haplotype Inference Problem

3 The Founder Sequence Reconstruction Problem

4 Boolean Network Design

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Biological DefinitionPurpose

Study the evolutionary history of a population of humanDNA sequencesHelps biologist to discover genetic bases of complexdiseases

Genetic Basis

A population is evolved from a relatively small number ofhaplotype foundersCrossover breaks and shuffles fragments of sequences

Goal

Find a set of founders which can reconstruct the sequences inthe population with the least number of crossovers

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

An Example of a “Mosaic”

0 1 1 0 1 0 0

1 1 0 1 1 1 1

1 0 1 0 0 0 1

1 1 0 1 1 0 1

1 0 1 0 0 0 1

0 1 1 1 1 1 1

0 1 1 0 1 0 0

1 1 0 0 0 1 1

Breakpoints correspond tocrossover eventsSolution value: 4Mosaic is not uniqueGiven a founder matrix, thecomputation of the optimalmosaic is polynomial

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

State of the Art

Complete algorithms:Dynamic ProgrammingTree search enhanced with pseudo Branch-and-Boundpruning: RECBLOCK

Incomplete techniques:Greedy HeuristicTabu Search

Complete techniques don’t scale well for large-sizedinstances

Nevertheless, running times for Tabu Search are quite longand still performances are not impressive

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

State of the Art

Complete algorithms:Dynamic ProgrammingTree search enhanced with pseudo Branch-and-Boundpruning: RECBLOCK

Incomplete techniques:Greedy HeuristicTabu Search

Complete techniques don’t scale well for large-sizedinstances

Nevertheless, running times for Tabu Search are quite longand still performances are not impressive

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

State of the Art

Complete algorithms:Dynamic ProgrammingTree search enhanced with pseudo Branch-and-Boundpruning: RECBLOCK

Incomplete techniques:Greedy HeuristicTabu Search

Complete techniques don’t scale well for large-sizedinstances

Nevertheless, running times for Tabu Search are quite longand still performances are not impressive

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

State of the Art

Complete algorithms:Dynamic ProgrammingTree search enhanced with pseudo Branch-and-Boundpruning: RECBLOCK

Incomplete techniques:Greedy HeuristicTabu Search

Complete techniques don’t scale well for large-sizedinstances

Nevertheless, running times for Tabu Search are quite longand still performances are not impressive

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Contributions

Randomized Iterated GreedyFast incomplete algorithmBetter than former Tabu Search algorithmProvides the initial solution to our LNS

Large Neighborhood Search (LNS)Integrates and boosts RECBLOCK

Anytime solverBut eventually reaches the optimum if given enough timeCurrent state of the art method for FSRP

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Comparison against RECBLOCK

LNS−1−caching Reckblock−heur

0.00

0.05

0.10

0.15

evo instances

Sol

utio

n va

lue

rela

tive

diffe

renc

e

LNS−1−caching Reckblock−heur

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

ms instances

Sol

utio

n va

lue

rela

tive

diffe

renc

e

LNS−1−caching Reckblock−heur

0.00

0.01

0.02

0.03

0.04

rnd instances

Sol

utio

n va

lue

rela

tive

diffe

renc

e

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Comparison against Iterated Greedy

LNS−1−caching Iterated Greedy

0.00

0.02

0.04

0.06

0.08

0.10

0.12

evo instances

Sol

utio

n va

lue

rela

tive

diffe

renc

e

LNS−1−caching Iterated Greedy

0.00

0.02

0.04

0.06

0.08

ms instances

Sol

utio

n va

lue

rela

tive

diffe

renc

e

LNS−1−caching Iterated Greedy

0.00

0.01

0.02

0.03

0.04

0.05

rnd instances

Sol

utio

n va

lue

rela

tive

diffe

renc

e

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Outline

1 Introduction

2 The Haplotype Inference Problem

3 The Founder Sequence Reconstruction Problem

4 Boolean Network Design

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Scope of the Research

Boolean networks (BNs) are complex dynamical systems(and models of complex systems)

Recent research, mainly in biology, needs to find/designmodels satisfying given requirements

Hot topic in complex system biology

Biological Standpoint

Employ Boolean networks as a modeling tool

Ensemble Approach

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Boolean Networks

Introduced by Stuart Kauffman as a models of geneticregulatory networks (GRNs)

Discrete-time/discrete-state dynamical system

Non trivial (complex) dynamics

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Boolean Networks

Structure

Directed graph of N nodesNode i :

- Boolean value xi

- Boolean function fi

Boolean function arguments are variables associated toinput nodes of i

Node state (i.e., Boolean variable) updated as a function offi

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Boolean Networks

Dynamics

System state at time t : s(t) = (x1(t), . . . , xN(t))

Dynamics controls node update

Deterministic synchronous update

Every state has an unique successor

Variants

Several variants exist:BNs with asynchronous dynamicsBoolean threshold networksProbabilistic Boolean networksGlass networks

We focus on the most studied model

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Dynamical Features

Trajectories

TransientAttractor

Attractors

FixpointsCycles

Basin of Attraction (of attractor A)

Set of states belonging to the trajectories ending at attractor A

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Problem Difficulties

How do we obtain networks that match a set ofdesiderata?

Network space is enormous so random generation is not anoption

How do we effectively explore such space?

How do we guide such search process?

How do we evaluate the “quality” of a network?

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Our Approach to Automatic Design

We cast this problem into the framework of combinatorialoptimization

1 A Boolean network is a point in the search space2 A suitable objective function (OF) is defined3 Search space is equipped with a notion of neighborhood

(topology)4 We choose and apply a (meta)heuristic search strategy5 OF evaluation can be performed by sampling the network

state space

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Iterated Local Search Framework

Generic Algorithm Template

1: INPUT: a local search2: s ← generateInitialSolution()3: s∗ ← localSearch(s) {Stochastic Descent in our

experiments}4: while termination conditions not met do5: s′ ← perturbation(sbest)6: s′

ls ← localSearch(s′)7: s∗ ← acceptanceCriterion(s∗

, s′ls)

8: end while9: return s∗

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

The Boolean Network Toolkit

OF evaluation is the most computationally intensive task

A fast and flexible simulator is required

BnToolkit

Efficient library BN simulation and analysis

Written in C++

Open Source project

Available athttp://booleannetwork.sourceforge.net

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

The Boolean Network Toolkit

OF evaluation is the most computationally intensive task

A fast and flexible simulator is required

BnToolkit

Efficient library BN simulation and analysis

Written in C++

Open Source project

Available athttp://booleannetwork.sourceforge.net

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Applications

Our design methodology successfully applied to threeproblems in biological modelingBNs with maximally distant attractors

We can study properties of more biologically plausiblenetworks

Boolean networks as classifiersInvestigate an important topic in artificial learning systems

BNs as models of cellular differentiationIt helps to validate the model against real dataWe aim to generate networks that predict behaviour ofexisting cell types

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

BNs with Maximally Distant Attractors

Limitations of Classic BNs

Attractors in BNs can be interpreted as cell typesThe attractor set in classic BNs is very similar attractors(they differ for just a few values)They are no longer distinguishable if a different updatescheme is usedSynchronous deterministic update could generate spuriousattractors

Overcoming the Limitations

We aim at designing synchronous deterministic BNs inwhich attractors be as much different as possible

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

BNs with Maximally Distant Attractors

Limitations of Classic BNs

Attractors in BNs can be interpreted as cell typesThe attractor set in classic BNs is very similar attractors(they differ for just a few values)They are no longer distinguishable if a different updatescheme is usedSynchronous deterministic update could generate spuriousattractors

Overcoming the Limitations

We aim at designing synchronous deterministic BNs inwhich attractors be as much different as possible

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Boolean Networks as Classifiers

Biological motivations

Cell behaviour can change in response to differentconditions in the environmentFrom an abstract standpoint, a cell is able to solve aclassification problem

Environmental conditions are examples to classifyCell dynamics, represented by attractor states, areresponses

GoalDesign BNs which are able to solve the DensityClassification Problem (DCP)

Determine if a binary string contains more 0s than 1s

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Boolean Networks as Classifiers

Biological motivations

Cell behaviour can change in response to differentconditions in the environmentFrom an abstract standpoint, a cell is able to solve aclassification problem

Environmental conditions are examples to classifyCell dynamics, represented by attractor states, areresponses

GoalDesign BNs which are able to solve the DensityClassification Problem (DCP)

Determine if a binary string contains more 0s than 1s

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

A Model of Cellular Differentiation

Biological motivations

Overcoming the limitation of attractors/cell typescorrespondenceCell types are Threshold Ergodic Sets:

Sets of attractors. . .Stable under certain level of noiseNoise models external environmental conditions

Can describes differentiation trees of pluripotent cells

Goal

Impose constraints on:Attractors landscapesDifferentiation tree shapes

Ongoing researchStefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

A Model of Cellular Differentiation

Biological motivations

Overcoming the limitation of attractors/cell typescorrespondenceCell types are Threshold Ergodic Sets:

Sets of attractors. . .Stable under certain level of noiseNoise models external environmental conditions

Can describes differentiation trees of pluripotent cells

Goal

Impose constraints on:Attractors landscapesDifferentiation tree shapes

Ongoing researchStefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

A Model of Cellular Differentiation

Biological motivations

Overcoming the limitation of attractors/cell typescorrespondenceCell types are Threshold Ergodic Sets:

Sets of attractors. . .Stable under certain level of noiseNoise models external environmental conditions

Can describes differentiation trees of pluripotent cells

Goal

Impose constraints on:Attractors landscapesDifferentiation tree shapes

Ongoing researchStefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Summary

Competitive hybrid metaheuristic algorithm for HaplotypeInference by Parsimony

State-of-the-art metaheuristic algorithm for the FounderSequence Reconstruction Problem

Automatic design methodology of BNs successfully appliedto three modeling problems

Flexible and efficient Boolean network simulator software

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Activity I

Journal papers:Battarra, M., Benedettini, S., and Roli, A. (2011).Leveraging saving-based algorithms by master-slavegenetic algorithms. Engineering Applications of ArtificialIntelligence, 24:555–566Roli, A., Benedettini, S., Stützle, T., and Blum, C. (2012).Large neighbourhood search algorithms for the foundersequence reconstruction problem. Computers & OperationsResearch, 39(2):213–224Benedettini, S., Manfroni, M., Villani, M., Serra, R.,Gagliardi, A., Pinciroli, C., Birattari, M., and Roli, A.(accepted with minor revision). Learning Boolean networks:an approach with metaheuristics. Neurocomputing

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Activity II

Collaborations:Prof. Christian Blum, UPC, Barcelona, from sep. 2009 tomar. 2010Prof. Thomas Stützle, IRIDIA, ULB, Brussels, from sep.2010 to dec. 2010Prof. Roberto Serra and Marco Villani, University ofModena and Reggio Emilia, 2009-ongoing

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

Metaheuristics for Search Problems inGenomics

— New Algorithms and Applications —

Stefano Benedettini1

DEIS, Alma Mater Studiorum Università di Bologna, Campus of Cesena, [email protected]

Stefano Benedettini Metaheuristics for Search Problems in Genomics

IntroductionThe Haplotype Inference Problem

The Founder Sequence Reconstruction ProblemBoolean Network Design

References I

[1] Battarra, M., Benedettini, S., and Roli, A. (2011).Leveraging saving-based algorithms by master-slave geneticalgorithms. Engineering Applications of Artificial Intelligence,24:555–566.

[2] Benedettini, S., Manfroni, M., Villani, M., Serra, R.,Gagliardi, A., Pinciroli, C., Birattari, M., and Roli, A.(accepted with minor revision). Learning Boolean networks:an approach with metaheuristics. Neurocomputing.

[3] Roli, A., Benedettini, S., Stützle, T., and Blum, C. (2012).Large neighbourhood search algorithms for the foundersequence reconstruction problem. Computers & OperationsResearch, 39(2):213–224.

Stefano Benedettini Metaheuristics for Search Problems in Genomics