Flexible and universal multiple target nucleic acid

Preview:

Citation preview

DoktoratsKollegRNA Biology

Abstract

Introduc�on to RNA design

Mo�va�on: Due to the close structure to func�on rela�onship, the availability of good structure predic�on methods, and energy models, RNA is perfectly suited to design molecules with predefined proper�es. Currently available RNA design

tools implement very specialized use‐cases which cannot be easily adapted to new applica�ons. O�en, complicated sampling and op�miza�on methods were developed to suit a specific design goal.Results: We developed RNAblueprint, a C++ library implemen�ng a graph coloring approach to uniformly sample sequences compa�ble to structural and sequence constraints from the typically huge solu�on space. Fair sampling from the solu�on space makes op�miza�on runs much more performant and raises the probability to find be�er solu�ons. Scrip�ng interfaces allow to easily adapt exis�ng code to new scenarios which makes the whole design process very universal and flexible. We implemented novel design approaches such as a mul�stable thermoswitch or a sRNA mediated transla�on regula�on system to show the advantages of our so�ware.

A

G

A

G

C U

U

A

AA

C

U

C

U

50U

SequencesAGAUGACUACUGACAUC

Experimental validation:◾ in-vitro◾ in-vivo

Structural constraintsSequence constraintsLandscape properties.((((.(....).))))ANNNNNCGAAAGNNNNN-1.8 Kcal/mol

Energ

y

Configurations

𝚫E

EA

Function

In silicocharacterisation

To be able to de novo design RNA molecules it was necessary to develop methods to solve the inverse folding problem and approaches for an in silico characteriza�on. A design task can be generally split into three components: (1) generate valid sequences compa�ble to sequence and structural constraints, (2) formulate an op�miza�on problem where the sampled sequences are op�mized towards a op�mal score calculated by the objec�ve func�on and (3) use in silico characteriza�on methods such as filtering and clustering with respect to problem specific features not included in the objec�ve.

A

Frequency

Frequency of the solution foundB

Sample size relative to size of solution space [%]

Uniq

ue s

olu

tions

[%]

Fair sampling of RNA sequences

Figure 1: Differences between fair and unfair stochas�c sampling for a small design example. A: The histogram shows how frequent unique solu�ons were found when sampling sequences using fair and unfair sampling. B: Even when sampling much more solu�ons than available in the solu�on

space (∼230%), we s�ll get only 50% of all possible solu�ons with unfair sampling, while we are able to sample about 90% with the fair method.

The developed so�ware [1] implements a graph theore�cal approach [2] to be able to handle mul�ple structural constraints. In addi�on, it takes any sequence constraint in IUPAC annota�on as input and generates sequences compa�ble to all constraints.RNAblueprint offers func�ons to get proper�es of the solu�on space and current search spaces, such as the number of solu�ons. This feature can, for example, be used for muta�onal studies. For efficient op�miza�ons we guarantee to sample every solu�on with the same probability from the whole solu�on space as shown in Figure 1.

Mul�‐state thermoswitchTo show the advantages of our so�ware, namely theflexibility, universality and efficiency, weimplemented serveral designs including a mul�statethermoswitch. This RNA molecule is able to fold intothree dis�nct states at specific temperatures. Theobjec�ve func�on not only includes a term to favourthe given structures at their temperatures, but alsoto counter select for the structures at all othertemperatures.

Obje�ve Func�on

10

20

30

10

20

30

10

20

30

Pro

bab

ility

Sp

ecific

heat

[kca

l ⨉ (

mol ⨉ K

)-1]

0 20 40 60 80 100

0

1

2

3

4

1

heat capacity(((((((((((((....))))))))))))) 5°C(((((.....)))))(((((.....))))) 10°C(((((.....)))))............... 37°Cprobability of the current MFE

GAUCUGUGUGGGGUCGAUUUUGUGUGGGUU

Temperature [Celsius]

References[1] Stefan Hammer, Birgit Tschiatschek, Christoph Flamm, Ivo L. Hofacker and Sven Findeiß. RNAblueprint: Flexibleand universal mul�ple target nucleic acid sequence design. Bioinforma�cs, submi�ed 2016.[2] Chris�an Höner zu Siederdissen, Stefan Hammer, Ingrid Abfalter, Ivo L. Hofacker, Christoph Flamm, and Peter F.Stadler. Computa�onal design of RNAs with complex energy landscapes. Biopolymers, 99(12):1124–1136, 2013.[3] Christoph Flamm, I. L. Hofacker, S. Maurer‐Stroh, P. F. Stadler, and M. Zehl. Design of mul�stable RNA molecules.RNA, 7(2):254 265, February 2001.[4] Joseph N. Zadeh, Conrad D. Steenberg, Jus�n S. Bois, Brian R. Wolfe, Marshall B. Pierce, Asif R. Khan, Robert M.Dirks, and Niles A. Pierce. NUPACK: Analysis and design of nucleic acid systems. Journal ofComputa�onal Chemistry, 32(1):170–173, 2011.[5] Ronny Lorenz, Stephan H. Bernhart, Chris�an Höner zu Siederdissen, Hakim Tafer, Christoph Flamm, Peter F.Stadler, and Ivo L. Hofacker. ViennaRNA Package 2.0. Algorithms for Molecular Biology, 6(1):26, November 2011.

Figure 3: Heat capacity curve and thermodynamic probabili�es of the target structures evidence the sucessfuldesign. The molecule folds into the desired states at the given temperatures with high probabili�es and showsdis�nct transi�ons between the states. The RNA completely unfolds above 70 degrees to then exhibit the openchain forma�on as minimum free energy structure.

Obje�ve Func�on

sRNA regulated gene expression

We furthermore implemented a regulatory5'UTR that retulates expression of itsdownstream gene by responding to thepresence of a small RNA. We therefore cameup with a novel objec�ve func�on which notonly contains the accessibility of the RBS andthe sRNA binding site, but also theconcentra�on of the formed sRNA/5'UTRcomplex for efficient binding. An addi�onalterm minimizes the crosstalk of the 5'UTRwith the downstream reporter gene.

mRNA ComplexRBS

+

sRNA

XRBS

G

G

G

AAGCACCAGUC

GG

UU

UA

U

C

A C

AA

GC

CG C U G G U G C U U

A

G

A

G

U

A

A

A

G

A

U

A

A

G

G

A

G

U

A

A

U

G

A

A

A

U

G

C

GT

AAG

GG

CGA

A

GA G

CT

TT T

T

AC

C

G

G

T

G

T

T

G

T

G

C

C

T

A

T

T

C

T

C

G

T

A

G

A

G

T

T

A

G

A

T

G

G

C

G

A

C

G

T

T

A

A

T

1

10

20

3040

50

60

70

80

90

100

110

120

130

132

G

G

G

U

C

C

U

AG

C

G

U

CA

U

U

CA

C

G

C

G

G

A

C

C

C U U C A U U U C A U U A C U C C U U A U C U U U A U U C A C C U A G C A U A

A

C

C

C

C

U

U

G

G

G

G

C

C

U

C

U

A AA

C

G

G

G

U

C

U

U

G

A

G

G

G

G

U

U U U U U G

1

10

20

30 40 50 60

70

80

90

100105

G

G

G

A

A

G

C

A

C

C

A

GU

C

G

G

U

U

U

A

U

C A

C

A

A

G

C

C

G

C

U

G

G

U

G

C

U

U

A

G

A

G

U

A

A

A

G

A

U

A

A

G

G

A

G

U

A

A

U

G

A

A

A

U

G

C

G

U

A

A

G

GG

CG

A

A

G A

GC U

U

U

U

U

A

C

C

GG

U

G

U

UG

U

G

CC

UA

UU

C

UCG

UA

G A

G

U U

A

G

A

U

GG

C

G

A

C

G

U

U

A

A

U

1

10

20

30

40

50

60

70

80

90

100

110

120130

132

G

G

G

U

C

CU

A

G

C

G

U

C

AU

U

C

A

C

G

C

G

G

A

C

C

C

U

U

C

A

U

U

U

C

A

U

U

A

C

U

C

C

U

U

A

U

C

U

U

U

A

U

U

C

A

C

C

U

A

G

C

A

U

A

A

C

C

C

C

U

U

G

G

G

G

C

C

U C

U

A

AAC

G

G

G

U

C

U

U

G

A

G

G

G

G

U

U

U

U

U

U

G

1

1020

30

40

50

60

70

8090

100

105

[Scomplex]

[mRNA0]=1

P(5'UTRunpaired) = 0.94

P(sRNAunpaired) = 0.91

P(mRNAcontext) = 0.07

Figure 4: Graphical representa�on of the sRNA‐mRNA pair design result. Top le�: dot‐plot and centroid fold structure of the designed 5'UTR including some nucleo�des of the reporter gene. The 5' stem loop structure is very likely (large filled squares) whereas the remaining depicted base pairs (small filled squares) indicate high flexibility. The mRNA binding site (orange) is unstructured as intended. Bo�om right: same for sRNA. Again, desired stems dominate the structural ensemble, whereas the binding site (orange) is accessible. The dot plot above the diagonal depicts possible base pairs of the sRNA:mRNA complex. The upper right rectangle shows the inter molecular base pairs whereas the small triangles next to it indicate intra molecular base pairs. The corresponding minimum free energy structure of the complex is depicted in the lower le� corner (outside the dot plot) surrounded by the individual terms of the objec�ve func�on.

We provide a so�ware solu�on in form of the RNAblueprintlibrary which makes it possible to uniformly sample RNAsequences compa�ble to structural and sequence constraints. With our framework it is possible to incorporate almost anyso�ware like NUPACK [4] or ViennaRNA [5] in the calcula�onof the objec�ve func�on and it is now feasible to explore amuch broader range of objec�ves. We showed examples onhow to use the developed so�ware to implement a designmethod using de novo objec�ve func�ons that incorporateterms which were not available in other exis�ng designframeworks.

We are working hard to set up an extendible web‐service tonot only make our design tools easily available, but also toprovide a range of analysis tools for in silico verifica�on andcharacteriza�on of the generated designs or any other RNA ofinterest.

Conclusion

This work has been supported by the European Commission under the Environment Theme of the 7th Framework Program for Research and Technological Development (Grant agreement number 323987).

h�p://nibiru.tbi.univie.ac.at/blueprint/

Flexible and universal multiple targetnucleic acid sequence designStefan Hammer1,2, Birgit Tschiatschek2, Christoph Flamm1, Ivo L. Hofacker1,2,3 and Sven Findeiß1,2

1 University of Vienna, Department of Theoretical Chemistry, Vienna, A-1090, Austria2 University of Vienna, Bioinformatics and Computational Biology Research Group, Vienna, A-1090, Austria3 University of Copenhagen, Center for Non-coding RNA in Technology and Health, Copenhagen, DK-1870, Denmark