STARK WARS - software.imdea.organais.querol/STARK_WARS.pdf · this document with the great series of blog posts by StarkWare[16] and the entries by Vitalik Buterin [6, 7, 5] for a

STARK WARS

Daniel Benarroch

[email protected]

QEDIT

Anaıs Querol

[email protected]

QEDIT

Universidad Politecnica de Madrid

September 2019

Abstract

The overall idea of STARKs is to transform a claim on a certain computation to a claim onthe low-degreeness of one polynomial. There’s a huge gap in between these statements, andthe goal of this document is to get these concepts closer as all the mathematical backgroundrequired is self-contained. We will base our explanations on practical examples such as Ped-ersen commitments, a basic tool in privacy-preserving applications on blockchains. Thisreport comprises an in-depth analysis of the JavaScript library genSTARK[11] for QEDIT’sinternal understanding of STARKs and shall be a handy guide for future developers imple-menting their own code for this scheme.

Contents

1 STARKs 3

1.1 AIR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.1.1 Example: Exponentiations . . . . . . . . . . . . . . . . . . . . . . . . 5

1.1.2 Example: Pedersen commitments . . . . . . . . . . . . . . . . . . . . 9

1.2 LDE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.3 ALI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.4 FRI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

1.4.1 Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2 Benchmark 24

2.1 genSTARK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.1.1 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.1.2 AirScript . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.2 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.2.1 genSTARK Prover . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.2.2 genSTARK Verifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.2.3 genSTARK Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

2.2.4 genSTARK Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

1

2.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

2.3.1 Number of steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

2.3.2 State width . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

2.3.3 Number of Pedersen commitments . . . . . . . . . . . . . . . . . . . 45

2.3.4 Number of boundary constraints . . . . . . . . . . . . . . . . . . . . 46

2.3.5 Number of queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

2.3.6 Field size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

2.3.7 Constraint degree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

2.4.1 Trade-off column VS row . . . . . . . . . . . . . . . . . . . . . . . . 50

2.4.2 STARK vs SNARK . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

2.4.3 Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

A Theoretical complexity 54

2

Chapter 1

STARKs

The goal of this report is to provide a friendlier first contact with the term STARK (otherthan House Stark), so that getting familiar with this cryptographic tool is not a war withyourself. The best advice we can give you to start this journey; be patient, read the examplesin detail, and consider reading other great surveys. We suggest the reader to complimentthis document with the great series of blog posts by StarkWare[16] and the entries by VitalikButerin [6, 7, 5] for a great introduction to STARKs and some of their math, before jumpingto the original papers (only for the bravest).

This document addresses the problem of building Scalable Transparent Arguments of Knowl-edge (STARKs)[2]. In order to understand STARKs, we will use simplified examples, suchas exponentiations, to give intuitions of the constructions involved. This one will give usthe means to build instantiations with conditional statements as part of their computation,and the use of both public and secret inputs. We will base our explanation on a basic typeof STARKs that has no optimizations, but should suffice to understand their main compo-nents. We refer to our discussion on improvements to read some ideas to make more efficientimplementations. We kindly suggest the reader to . Following this line, we will apply whatwe learnt to our own testing script to verify Pedersen commitments using STARKs. Ourfinal goal will be to build STARKs to verify QEDIT transactions in zero-knowledge, andcompare their performance with that of the zk-SNARKs already in use.

If you are a beginner in STARKs, or do not even know how you came across this post,you may like the following metaphor to clear your mind about all the cryptographic beastsinvolved. First things first, if you know how to fry an egg then you are prepared to start yourown STARK (figuratively speaking, though). Now you need an egg, this is the statement(the thing) you want to produce a STARK for. Then, you crack the shell and the eggfalls from the air. This is the AIR step. Your egg will then get into your aluminum pan,the ALI step (a bit contrived, I know). Now you simply FRI the egg. Now let’s jump tothe mathematical world giving a short description of the whole algorithm, which will beexplained in much deeper detail in the upcoming sections. Even if we made our best toexplain STARKs, you better be prepared because winter is indeed coming.

3

First, we have to define our desired computation as an execution trace, as efficiently aspossible. Basically, we will define a transition function linking consecutive steps together,and have a few assertions on the starting and ending points of the trace. You will end upwith a matrix where the number of rows is given by the number of steps of the computation,and the number of columns is defined by the number of variables involved. Note that thenumber of steps and field size need to meet some requirements that we will talk aboutlater on. Then we will build some polynomials that encode the execution trace. Basically,the next step consists on building some polynomials that verify that the execution tracewas correct. That is, among many others, we have Pi(X) that outputs each step of thecomputation on the i−th variable, Ii(X) is a function that passes through the boundarypoints, and Di(X) and Bi(X) encode the transition function and boundary assertions butwith smaller degree (useful to diminish run time). We want to prove that these polynomialsindeed represent what they are supposed to, and that they are not just some arbitrarypolynomial. Now we use some kind of structure called a Merkle tree that will contain theevaluations of Pi(X), so we can query it to make sure that some random points were usedto build the whole thing. Essentially, thanks to the binding property, we will be convincedthat the points that we asked for were indeed evaluations of some polynomial. Now wewant to prove that this polynomial is in fact the one obtained from the execution traceof the claimed computation. We do this by building a second Merkle tree with a linearcombination of all the polynomials above. Without getting into details, the verifier asks fora sufficiently large number of points to make sure that the content of this tree is indeed thereal thing and not just a random polynomial of a given degree. Here, the prover will providewith some coefficients that define the polynomial beneath, so the verifier can recompute thewhole evaluation and check that he obtains the same value as the prover claimed. This isthe FRI protocol, where the prover starts with a large number of evaluations of a claimedpolynomial, and will reduce his claim on smaller number of evaluations until the verifier cancheck by itself. If the prover were lying, then the verifier would have got a different value.

1.1 AIR

When building a STARK for a certain problem, we first have to transform our statementto something like “this computation outputs value y mod |F| on input x and witness w ina given time”. That is, C(x,w) = y in T steps, and we are working over a finite field F.Note that the use of steps as a way to define the course of time within our computation.Indeed, a crucial (and unfortunately, non-automated) part of the efficiency of our STARK isto define our initial computation in a repetitive and succinct way. We will soon understandwhy this is the case.

The first part of building STARKs is the Arithmetic Intermediate Representation (AIR),which sets a basic notation on top of which we will build some necessary structures. Thegoal of this section is to understand how one goes from a standard computation to STARK-friendly notation. In particular, we will write a constraint system describing the com-putation of one exponentiation and generate its execution trace. We want to define thiscomputation as succinctly as possible, as this would reduce the number of constraints for

4

a faster execution of the protocol. When we learn how to do this, we will build AIRs forPedersen commitments.

1.1.1 Example: Exponentiations

Obtaining the AIR for exponentiations is both simple and illustrative. This is an exampleeveryone will understand, but still gives a very complete intuition of what can be donewhen our computation includes public inputs and conditionals. First thing we have to askourselves: what is a step for the computation np, where p is some public input.

Let’s explain some components of the AIR using the naive approach. Say the prover hassome circuit that multiplies an initial number n by itself a p number of times in the fieldF96769. That is, this is a simple loop that performs the np mod 96769 computation usingO(p) multiplication gates. Then, the prover claims that evaluating this circuit on n = 2and input p = 32 returns 232 = 4294967296 ≡96769 68769. Please, ignore the fact thatpowers of 2 are computed in constant time, as their binary representation consists of a bitset to 1 followed by log2 p− 1 zeroes. Given the iterative nature of this circuit, defining thestatement in repeated steps is immediate:

f1 = 2 (1.1)

∀ 1 ≤ i < 32 : fi+1 = n · fi (1.2)

f32 = 68769 (1.3)

However, it is very unlikely that any party would like to run a whole STARK protocolto verify such a simple statement that it could run itself in such a small amount of time.Instead, these schemes are used for much larger instances carrying an unaffordable com-plexity, leading to verifiable delegation of computation, or those cases where some of theparameters are hidden to the verifier. In this part, we will motivate the former situation,and will leave the latter for upcoming sections.

Now suppose the prover makes a claim about the same circuit, but this time using a largerexponent p = 4294967296 (a 32-bit number). Writing the AIR in the above format incursin a number of steps T ≈ 232. This becomes an undesirable problem as the exponent grows,which for good and for bad, is the usual tone in cryptographical applications.

Instead, we want to implement exponentiations in the way computers do. We will usethe square-and-multiply algorithm. This will allow for a computation as heavy as in thenaive approach, but for exponentially larger instances. That is, the claim on n4294967296 willrequire at most log2 4294967296 = 32 steps.

The algorithm works as follows. It receives the exponent in binary notation, and will readits bits from left to right (from most significant to least significant bits). If the bit is zero,it multiplies the accumulated computation by itself. If the bit is one, it also squares the

5

computation and then multiplies by the base. As noted above, this mechanism runs linearin the number of bits of the exponent, instead of linear in the exponent.

Let’s look at a toy example to make sure we understand the procedure.

Prime field F257

Initial value 1Steps 4Claim 213 ≡ 225 mod 257Exponent [ 1 1 0 1 ]Algorithm SM SM S SM

Computation (((12 · 2)2 · 2)2)2 · 2

The computation being addressed in this example is given by the following flowchart.

Step i < T pi f2i−1

n · f2i−1

1

0

These conditions can easily be packed to a single polynomial translating standard logicalexpressions as:

x AND y → x · yx OR y → x+ yNOT x → (1− x)

Take into account that AND operators increase the degree of the polynomial by one. Applyingthe algorithm above, and conditionals to model the zero/one case, we can easily redefine amore efficient AIR for the same computation as follows. This is a great example to see howrelevant can this first part be for the overall complexity of the STARK.

f0 = 1 (1.4)

∀ 0 ≤ i < 32 : fi+1 = fi · fi · (pi · n+ (1− pi)) (1.5)

f32 = 1350 (1.6)

We call the first and last equations the boundary constraints as they describe the startingand final states. The other equation is the transition constraint defining the computation

6

step by step. Note that defining our function in this repeated format makes the definition ofour computation modular. Meaning that adding more steps will not require the programmerto hardcode more lines to the system.

Without loss of generality, our AIR will work over power of 2 number of steps, and Edivides the multiplicative subgroup of the finite field. If this is not the case for yourdesired computation, we can fill with dummy entries and create a boundary constrainton some intermediate step. This will make the upcoming arithmetic requirements to beeasily satisfied in an arbitrary example as ours. Otherwise, we would not be able to createa cyclic subgroup of F.

Here you can see some nice finite fields (T should divide the number just below the fieldorder). You can obtain yours running a simple brute force search in SageMath. Specialthanks to GuildOfWeavers for providing us with some primes for large fields.

Prime Field Order Bits |F∗|257 8 28

96768 16 29 · 33 · 74194304000 32 225 · 5318321977041912594433 64 220 · 461 · 161303 · 2349790xA3DF 1ED7 48B7 F {0}18 1 128 276 · 11 · 79 · 33174442432190x{F}23 7 {0}7 1 128 232 · 11 · 167 · 239 · 853 · 641110271 · 3299823877030xBF3D EB18 42BC 6 {0}50 1 256 2205 · 3 · 11 · 439 · 11611666055890x{F}53 EA1 {0}71 256

B = {256, 128, 64}

minE = 20

aux = 2^minE

while (not is_prime(aux)) :

aux = 2^20 * int(random()*2^(B-minE))+ 1;

aux

This means we have to slightly change our claim above, as a b-bit exponent generates b+ 1constraints. Meaning that our previous 33-step system will be filled with dummy entriesto reach T = 64, the smallest power of 2 that is larger than 32. Instead, we will assumethat our exponent is at most a 31-bit exponent (meaning that the first byte in hexadecimalrepresentation can be any symbol ranging from 0x0 to 0x7). Then, our tweaked claimfrom now on will be 31234567890=hex0x499602D2 ≡4194304001 4018103767, which generates thefollowing 32 constraints:

f0 = 1 (1.7)

∀ 0 ≤ i < 31 : fi+1 = fi · fi · (pi · n+ (1− pi)) (1.8)

f31 = 4018103767 (1.9)

7

Generally speaking, the prover will compute the execution trace of the computation con-sisting of a table with one column per state and one row per step. That is, a (T ·w)-elementmatrix. In our simple example, we only have one column containing the accumulated valueafter applying the iterations.

Following our constraints, we will map the exponentiation function f : F → F4194304001

to the polynomial P : G → F4194304001, where G is a cyclic subgroup1 of F∗96769 of orderT = 32 and generator g = 2906399817. This is the polynomial that outputs the executiontrace of the square-and-multiply sequence when evaluated on the 32 roots of unity of G.

P (1) = 1

P (g31) = 4018103767

∀ 0 ≤ i < 31 : P (gi+1) = P (gi) · P (gi) · (pi · (n = 3) + (1− pi))⇐⇒ ∀ x ∈ {1, g, . . . , g30}, 0 ≤ i < 31 : P (g · x) = P (x) · P (x) · (pi · (n = 3) + (1− pi))

We can easily turn these constraints into polynomials that evaluate to zero on the roots ofunity. That is, abusing notation2, Q(X) := C(P (X)) is zero if X ∈ {g0 . . . , g31}.

Q(1) := P (1)− 1

Q(g31) := P (g63)− 4018103767

∀ x ∈ {1, g, . . . , g30}, 0 ≤ i < 31 : Q(x) := P (g · x)− P (x) · P (x) · (pi · (n = 3) + (1− pi))

Before we go any further, let’s take a look at some of the parameters of our statement. OurAIR set will be composed of as many transition polynomials Pi(X) as states, and thus onepolynomial per column in the execution trace, of degree at most the number of steps, whichcorresponds to the number of rows in the execution trace. The degree of our constraintsystem will be that of the highest degree polynomial in our set, upper bounded by thenumber of steps in the computation and thus the number of rows in the matrix. Here wesay d ≤ T to consider the case where any of the steps is a linear combination of the others,so its row will not increment the degree of the polynomial. The roots of each polynomialCi(X), with “C” standing for constraint, are the at most T intermediate evaluation stepsof the computation.

Parameter Description

T |steps| ≡ |rows| ≡ |G|w |states| ≡ |columns| ≡ |polynomials| ≡ |constraints|µ max deg{Pi}w1 ≤ T

1Such subgroup exists as the order divides the size of the multiplicative field F∗; 32|41943040002Indeed, we should first extend the execution trace to a larger domain but we omit this part at the

moment for the sake of simplicity

8

Putting everything together, the AIR part requires the prover to evaluate the Pi polynomialson the execution trace, and to compute the Qi polynomials using interpolation. The costof interpolating a polynomial using Lagrange interpolation is quadratic in the number ofknown points; that is, O(T 2) in this setting. This complexity can be reduced to O(T log T )using FFT to obtain the coefficients.

1.1.2 Example: Pedersen commitments

Now we will give one step further building AIRs with more complex computations involvingconditionals, public inputs, secret inputs, and wider states. In particular, we will givean AIR for Pedersen commitments with very little changes with respect to the previousexample. The reason that we focus now on this primitive, is that it is a recurring componentof QEDIT’s asset transfer protocol and plenty of other privacy-preserving protocols on theblockchain.

Let G be a (known) generator of a group and H is some (known) power of G, the Pedersencommitment of a message m with secret randomness r is computed as

comPed(m, r) = Gm ·Hr

As one would expect, there are plenty of instantiation decisions to make here. Of course, wewill reuse what we learnt about the square-and-multiply algorithm to reduce the number ofrows in our execution trace. However, we will need to provide one polynomial for the correctexponentiation of Gm, another one for Hr, and then a proof of correct multiplication of thetwo. This means our execution trace will have more columns than the example above.

To make things simpler at the beginning, we will create two versions of the AIR: one with 3columns, where the third column multiplies the values of the G-column and the H-column;and another one with just 2 columns. Of course, the first approach performs unnecessarycomputations, as only the very last multiplication is required. But we want to have thesetwo scenarios to study the impact of different instantiations in the overall complexity. Forthe sake of clarity, the second approach will ignore the final multiplication, as it can beadapted with very little impact when we are more familiar with constraints.

The first exponentiation carries no modifications in comparison with the example above.Conversely, for the second, we need to introduce the notion of secret input. In this case,the randomness used as the exponent for h is a secret number only known to the prover.Even if this will not affect the AIR itself, it will make a big difference in further steps ofthe STARK.

The statement that we are going to build STARKs for G = 3 and H = 7 is the following3:

31234567890 · 71111111111 = 4018103767 · 1859733722 = 2788099368 over F4194304001

3The benchmark files we provide along with this document automatically generates random claims eachtime.

9

p = 0x499602D2 = [1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0]

s = 0x423A35C7 = [1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1]

We chose these exponents as they are both 31-bit numbers. In case they had differentlengths, the boundary constraints should refer to intermediate positions instead. Otherwise,this would carry an increase in the constraint degree to refer to the correct final step in thetwo previous columns, in the 3-column setting.

The following system is shared between the two approaches. We call PG(X) the polynomialof the execution trace of the first column, and QG(X) the constraint polynomial of thefirst column whose roots are the powers of root of unity {1, g = 2906399817, . . . , g31 =1560690925}. The same holds with the second column for PH(X) and QH(X), exceptthat here we use secret inputs instead. Note that these constraints have degree 3, so themaximum constraint degree is µ = 3.

PG(1) = 1

PH(1) = 1

PG(g31) = 4018103767

PH(g31) = 1859733722

∀ x ∈ {1, g, . . . , g30}, 0 ≤ i < 31 :

PG(g · x) = PG(x) · PG(x) · (pi · 3 + (1− pi))PH(g · x) = PH(x) · PH(x) · (si · 7 + (1− si))

QG(1) := PG(1)− 1

QH(1) := PH(1)− 1

QG(g31) := PG(g31)− 4018103767

QH(g31) := PH(g31)− 1859733722

∀ x ∈ {1, g, . . . , g30}, 0 ≤ i < 31 :

QG(g · x) := PG(g · x)− PG(x) · PG(x) · (pi · 3 + (1− pi))QH(g · x) := PH(g · x)− PH(x) · PH(x) · (si · 7 + (1− si))

Plus, the 3-column case has these other constraints to consider the final multiplication.Here, the maximum constraint degree is µ = 6, as deg(x · y) = deg(x) + deg(y):

10

P∗(1) = 1

P∗(g31) = 2788099368

∀ x ∈ {1, g, . . . , g31} :

P∗(x) = PG(x) · PH(x)

Q∗(1) := P∗(1)− 1

Q∗(g31) := P∗(g

31)− 2788099368

∀ x ∈ {1, g, . . . , g31} :

Q∗(x) := P∗(x)− PG(x) · PH(x)

Note that each of these columns have different polynomials associated with them. Also, theroot of unity g = 2906399817 is shared between columns because the field and number ofsteps does not change. The following chart summarizes the different parameters so far forthese two approaches:

Parameter 2-column 3-column

|F| 4194304001 4194304001g 2906399817 2906399817T 32 32w 2 3µ 3 6

1.2 LDE

Before we continue building our STARK, we need to understand the notion of low degreeextension (LDE) and its implications. Let f : S ⊆ F → F be a function defined over asubset of a finite field F taking values on the whole of F, we will define another functionf ′ : S′ ⊆ F → F whose domain size is smaller than the previous one |S′| < |S|. We sayf ′ is the LDE of f if they both share the same interpolating polynomial, meaning thatf ′(x) = f(x) : ∀x ∈ S′.

The LDE of f can also be seen as as an distance-amplifying encoding of f . Suppose you aregiven evaluations of a function that is really close to those of the real f , say their Hammingdistance is 1 and may go unnoticed. Nonetheless, f ′ will be way different from the LDEof this other function, so telling them apart is straightforward. According to the Schwartz-Zippel lemma [15, 17], two different degree-d polynomials can agree at most in d points. Sothe probability that this happens is d

|F| , which is negligible for large enough finite fields. Inthis sense, this encoding is an error-corrected version of f .

But why low-degree? Informally speaking, if i(x) interpolates both f and f ′, with thelatter defined over a smaller amount of points, then it must be the case that f is somehowredundant. In particular, if f ′ provides only |S′| points to build the interpolant, it will haveat most |S′| nonzero coefficients and thus its degree will be at most |S′| − 1. Conversely,if f provides |S| > |S′| points to build the interpolant and ends up being the same oneas before, then some of the points of the domain of f did not increase the degree of theinterpolant.

11

In our setting, we are interested in building such extensions as the domain size (and thus,degree of the polynomials) is a crucial parameter determining the overall complexity ofSTARKs. Let’s look at some examples to make all of this much clearer.

Extended execution trace

As part of the configuration for our AIR we need to think about two different domains.First, we have the (already seen) execution domain {1, g, . . . , g31} of size T = 32. Second,we define a much larger space, the evaluation domain with size E = T · e where e is theextension factor. This factor is linear in the maximum constraint degree e = 2dlog2(2µ)e =2d1+log2 µe = 2 ·2dlog2 µe. It essentially divides each subsequent point in the execution domainin e intermediate points. Put it another way, the generator of the execution domain can beexpressed as g = γe, where γ is the root of unity of the evaluation domain.

In our example, the 2-column extension factor is 8 and the 3-column e is 16. This makes ahuge difference, as the evaluation domain of the former is 256, whereas in the former E = 512.As mentioned above, the coefficients of the P (X) polynomials are obtained by interpolationin O(wT log T ) time. Each polynomial is represented as an array of coefficients where eachposition corresponds to the i−th power coefficient. In this example, our polynomials havedegree 32.

Now we evaluate them over the whole domain {1, γ2, . . . , γ2552 } where γ2 = 1867760616 forthe 2-column case and {1, γ3, . . . , γ5113 } where γ3 = 3185713831 for 3-column. Note γ2 isthe generator of a cyclic subgroup of F of order 256 and γ3 is the generator of a cyclicsubgroup of order 512 whereas the original (and smaller) subgroup had order 32. Noteg = γ82 = γ163 = 2906399817, and P (x)’s coincide with the 32 values of the execution tracewhen x ∈ {1, γ82 , γ162 , . . . , γ2482 }, and x ∈ {1, γ163 , γ323 , . . . , γ4962 }. That is, all of the 256 (resp.512) evaluations of P correspond to the extended execution trace. We will soon understandwhy we need these two domains, the 32-bit length and the extended one.

In a similar way, we evaluate Q(X) := C(P (X))’s on the extended trace. Remember thatwe define these polynomials such that they nullify on the roots of unity of the smallerdomain (the T steps of the execution trace). This has a huge impact on efficiency, as wedescribe next.

Parameter 2-column 3-column

e 8 16E 256 512γ 1867760616 3185713831

12

1.3 ALI

This section explains the notions about polynomials and groups required for the algebraiclinking IOP (ALI) step. The goal of these constructions will be to transform the purelyarithmetic statements in the AIR, to an IOP (Interactive Oracle Proofs) [4] friendly for-mat. At the end of this section you will understand the role in STARKs of many otherpolynomials.

The following statement is a basic property of polynomials. If Q(x) = 0 for all inputsin the set {1, g, . . . , g31}, then it must be the case that Q(X) is a multiple of Z32(X) :=31∏i=0

(X−gi), the smallest degree polynomial that nullifies on the whole of x ∈ {1, g, . . . , g31}.

In particular, we can define Q(X) = D(X) · Z32(X) as a polynomial decomposition. Thisis precisely where efficiency, and the first link between the original claim on a computationand the claim on low-degreeness, come in.

Going back to our square-and-multiply example, we will focus now on the Gm columnof the Pedersen commitment. Once we understand the details of the ALI, we will buildwhat’s needed for the 2-column and 3-column case. Recall that our transition constraintwas defined for i ∈ {0, 1, . . . , 30}, plus we had a boundary constraint in PG(g31). Thenwe defined our QG(X) function so it nullified on all of x ∈ {1, g, . . . , g30}. Of course, if

P (x) = 0 in x ∈ {1, g, . . . , g30}, it will also be a multiple of30∏i=0

(X − gi), as needed to define

our D(X). That is, g31 = γ248 is not a root of Q(X).

Nonetheless, the we will treat Z32(X) as a fraction with numerator and denominator, soit seems to have a somewhat more complex structure than the simple multiplication ofmonomials. In fact, this is due to some efficiency concerns that we will discuss later. Nowwe will jump back again to our exponentiation example to see how a more complex definitionof the zero polynomial can help.

Recall that our transition constraint was defined for x ∈ {1, g, . . . , g30} and 0 ≤ i < 31 asP (g ·x)−P (x) ·P (x) · (pi · 3 + (1− pi)) = 0. We will call the left part of the equation Q(x),which in this case has 31 roots. Following the same reasoning as above, there exists a divisorpolynomial such that D(X) = Q(x)∏30

i=0(X−gi). The problem is, computing this denominator

takes linear time in the number of roots ≈ |G| (which by the way, will be in the order ofthe number of rows in the execution trace). But we can be smarter than that. Note thatfrom the group theory and the cyclotomic polynomial, we know that the powers of g forma subgroup:

x|G| − 1 =∏

g∈G(x− g)

Whereas computing the right hand side costs O(|G|) operations, the left hand side can becomputed much faster in O(log |G|). Here’s where defining the roots of our polynomials as

13

roots of unity becomes important. We can now redefine the divisor polynomial in a muchmore efficient way and only with constant overhead as:

D(X) :=Q(X)

Z32(X)=

Q(X)

(X32 − 1)/(X − g31)

Where the zero polynomial Z32(X) is composed of the numerator (X32−1) and the denomi-nator is the degree-1 polynomial (X−g31). Now we will compute the components of Z32(X)on the evaluation domain. Recall that the numerator is the degree-32 polynomial with rootsat {1, γ8, . . . , γ248} (corresponding to the 32 steps in the execution trace), whereas the de-nominator is the degree-1 polynomial (X − g31) = (X − γ248) = (X − 1560690925) thatevaluates to zero on the only step for which the transition constraint was not defined. Whencomputing the numerator, note that the expression (x32−1) for x ∈ γ0, γ1, . . . , γ255 becomesthe length-8 sequence (γ0 − 1), (γ32 − 1), (γ64 − 1), (γ96 − 1), (γ128 − 1), (γ160 − 1), (γ192 −1), (γ224 − 1), repeated 32 times, which indeed nullifies for all roots {1, γ8, . . . , γ248}.

As we are working over finite fields, before computing the evaluations of D(X) we wouldcompute the inverses of the numerator of Z(X), so that:

D(X) := Z−132 (X)Q(X) = num−1(X) · den(X) ·Q(X)

As we mentioned at the beginning of this section, we are looking for a linking mechanismbetween the AIR and the FRI protocol. But beyond that, we would like to prove thecorrectness of the execution trace. Here’s the nice trick. One can agree that the aboveD(X) is a low degree polynomial if it is indeed defined as Q(X)

Z32(X) . Returning to our example,

D(X) will be a polynomial of at most degree 1 (so either a line or the constant function) ifit equals (X32− 1)−1 · (X − γ248) ·Q(x). If the latter holds, then the execution trace is welldefined (as it nullifies at the given points). Thanks to arithmetization, either D(X) will bea low degree polynomial, or it will be far from all low degree polynomials. Completenessholds in this direction of the implication.

For soundness however, we also need to check that D(X) is not just some arbitrary line, butformed out of the correct decomposition of Q(X). In any case, this check will only determinethe correctness of the execution trace. Here we also want to make sure that the boundaryconstraints hold and get a proof for the whole claim (meaning that the computation startedand finished at step T as it was supposed to).

We refer to the boundary polynomial as B(X). It is built from the assertions obtained fromthe boundary constraints that basically encode the claimed values of the computation atstarting and finishing points. This polynomial is defined as follows:

B(X) =P (X)− I(X)

Z2(X)

Recall P (X) outputs the extended execution trace. We define the interpolant I(X) suchthat it passes through the claimed assertions on the boundary points. In this case, it is the

14

line obtained by interpolating the two points I(γ0) = 1 and I(γ248) = 4018103767:

I(X) := 2636716692 + 1557587310X

Then, P (X)−I(X) evaluates to zero on {1, 1560690925}. Following the same idea as before,it must be a multiple of another:

Z2(X) := (X − 1)(X − 1560690925) = 1560690925 + 20633613075X +X2

Again, as we are working over finite fields, we need to compute its inverse to obtain theboundary evaluations as (P (X)− I(X))Z−12 .

Note B(X) evaluates to zero on the same two points I(γ0) = 1 and I(γ248) = 4018103767.The strategy to prove that the execution trace satisfies the boundary conditions, is toprovide the B(X).

The following figure was extracted from this great post by Vitalik Buterin [5], and explainsthe above very graphically with a toy example. Even if we are working over finite fields andthe graph plots polynomials in the real domain, our example will behave the same.

Figure 1.1: “Purple: computational trace polynomial (P ). Green: interpolant (I) (noticehow the interpolant is constructed to equal the input (which should be the first step of thecomputational trace) at x = 1 and the output (which should be the last step of the compu-tational trace) at x = gsteps−1. Red: P − I. Yellow: the minimal polynomial that equals 0at x = 1 and x = gsteps−1 (that is, Z2). Pink: P−I

Z2.”

Pedersen commitment In the previous paragraphs, we have explained how to build theALI for one part of the Pedersen commitment: the Gm with extension factor of e = 8.Luckily, the same reasoning works for the other polynomials PH(X) and P∗(X), so we willnot dive into the details now. However, we will give all components left in the equationsbelow for the sake of clarity. The reader can check this is the case through inspection.

15

2 column:

D(2)G (X) =

(X − g31) ·Q(2)G (X)

X32 − 1B

(2)G (X) =

P(2)G (X)− (2636716692 + 1557587310X)

1560690925 + 20633613075X +X2

D(2)H (X) =

(X − g31) ·Q(2)H (X)

X32 − 1B

(2)H (X) =

P(2)H (X)− (2490070726 + 1704233276X)

1560690925 + 20633613075X +X2

3 column:

D(3)G (X) =

(X − g31) ·Q(3)G (X)

X32 − 1B

(3)G (X) =

P(3)G (X)− (2636716692 + 1557587310X)

1560690925 + 20633613075X +X2

D(3)H (X) =

(X − g31) ·Q(3)H (X)

X32 − 1B

(3)H (X) =

P(3)H (X)− (2490070726 + 1704233276X)

1560690925 + 20633613075X +X2

D(3)∗ (X) =

(X − g31) ·Q(3)∗ (X)

X32 − 1B

(3)∗ (X) =

P(3)∗ (X)− (265373442 + 3928930560X)

1560690925 + 20633613075X +X2

Since for all states in both settings the boundary constraints refer to the same steps ofthe execution trace, we note that the Z(X) denominators of the boundary polynomialscoincide. Similarly, the Q(X) decomposition is formed by the same zero polynomial.

Z(2)G (X) = Z

(2)H (X) = (X − γ02)(X − γ2482 )

= (X − 1)(X − g31)

= (X − γ03)(X − γ4963 ) = Z(3)G (X) = Z

(3)H (X) = Z

(3)∗ (X)

den(2)G (X) = den

(2)H (X) = (X − γ2482 )

= (X − g31)

= (X − γ4963 ) = den(3)G (X) = den

(3)H (X) = den

(3)∗ (X)

It is not difficult to realise that the interpolants of the 2-column and the 3-column coincidefor PG(X) and PH(X), as in both cases the lines will pass through the same two points. Inparticular, these are the evaluation points needed to obtain I(X):

I(3)G (γ03) = 1 = I

(2)G (γ02)

I(3)H (γ03) = 1 = I

(2)H (γ02)

I(3)G (γ4963 ) = 4018103767 = I

(2)G (γ2482 )

I(3)G (γ4963 ) = 1859733722 = I

(2)G (γ2482 )

I(3)∗ (γ4963 ) = 2788099368 = 4018103767 · 1859733722

16

Now that we understand the meaning of all these polynomials, we need to create proofsfor them. However, the next step of STARKs is a bit costly (though not as costly asmultipoint evaluation and polynomial interpolation), so we want to mix them into onesingle polynomial. This way we will only need to run the FRI protocol once. We will dothis by sampling a random linear combination of the polynomials comprising the statement.But before we jump to that, we need to understand the structure beneath.

Merkle trees

A Merkle tree [12] is an integrity-preserving and privacy-protecting data structure. It storessome data on the leaves of a binary tree and all the intermediate nodes are computed asthe hashed value of their left and right children. The root of the tree can be seen as acommitment to the content of all of the leaves. This way, membership in the tree is provenby providing an authentication path (all intermediate nodes) from one leaf to the root. Thenthe verifier will recompute the root using this path and check it equals the one committedto before. Privacy is enforced by the hiding property of the hash function, and soundness isgranted by its binding property. Hiding means that the hash of some value reveals nothingabout its content. Thanks to binding, the prover cannot find two different preimages of thesame hash, so if the verification succeeds then it must be the case that the leaf was in theMerkle tree. We can choose any modern secure hash function such as SHA-256 (32 ·8 = 256bits). This one produces 32 byte blocks, regardless of the length of the input. That is, ahash function is compressing as well.

A Merkle tree for n leaves will have a depth (height if you like) of log2 n, and the totalamount of nodes in such tree is 2n − 1. Then, using a 32-byte hash function, storing thewhole tree requires 32(2n−1) ∼ 64n bytes. The prover can compute the root in O(n) time,the authentication path for one leaf has O(log2 n) elements, and verifying one path costsO(log2 n). Denote by n the layer where the 2n leaves are stored, and by 0 the smallest layerwhere only the root is located. Then, each level has half of the nodes in the level below (aseach two nodes produce one single parent). We call the sibling of one node the other nodein the same layer that has the same parent. Note that in a binary tree, nodes will have onlyone sibling. Depending on the parity of the position of the node, its sibling will be locatedon its left or its right.

Libraries will normally use a well-known, but still very clever, method to access nodes inthe tree structure in constant time. For this, we need to think of integers in their binaryrepresentation. The whole structure is stored as a list of arrays, to whose positions you canaccess independently. We start numbering nodes from the widest level, from left to right.That is, we assign positions [0, . . . , n − 1]. Or in binary, from 0...0 to 1...1, using log nbits. Then the sibling of an even node is located on its right (so the sibling is odd), andthe sibling of an odd node is located on its left (so the sibling is even). That is, we canefficiently compute its position negating the least significant bit as sib = nod xor 1. Inthe upper level, their parent is identified by the common prefix of its children (bits|0),

(bits|1). This position can be computed as par = chi >> 1.

17

In the 3-column setting for Pedersen commitments, we build one depth−9 Merkle tree ofE = 512 leaves that takes about 32KB of memory storage, whereas in the 2-column settingwe build a depth−8 Merkle tree of E = 256 leaves that takes almost 16KB of memory.Here, 32B is the standard block length for hash functions and we must not try to save spacethere, or dishonest parties could break the binding property (as it would be easier to findcollisions). What will make our structure larger is the evaluation domain size while, on theother hand, should be set large enough for soundness. The STARK paper recommends anextension factor of e = 2 · 2dlog2 µe, as we explained before. What is really interesting, is toreduce the constraint maximum degree in the AIR phase. We still did not explain the realreason why we need an extended domain, and will address this in the FRI section.

Recalling the hiding property of hash functions, we want to include some private informationon the leaves of our tree that can be used for non-leaking verification. In particular, both theextended execution trace and secret inputs are merged into the input for our hash function.Here, we are using the variable-length property of hash functions. The concrete length inbits of the leaf content can be estimated as the logarithm base 2 of the largest possiblevalue. On the 2 column setting, for each i ∈ {0, . . . , 255} the i−th leaf in the Merkle treewill contain the evaluations of PG(γi2), PH(γi2) and the i−th most significant bit of the secretrandomness. Whereas on the 3-column approach, the j−th leaf for any j ∈ {0, . . . , 511}contains P

(3)G (γj3), P

(3)H (γj3), P

(3)∗ (γj3) and the j−th most significant bit of r. Assuming the

same field F4194304001is being used in both settings, one leaf has w · log2 |FF |+s ·1 = 32w+1bits (65 in the 2-column VS. 97 in the 3-column). If our initial computation had a widerstate (and thus the execution trace had more columns), or we had a different witness, thislength would be modified accordingly. In any case, the hash function will compress anyarbitrary-length leaf to a 32-byte block. We call this Merkle tree the evaluation tree (asthere will be more Merkle trees involved in one STARK proof).

The verifier will query the evaluation Merkle tree on a bunch of random points with therestriction that none of them can be positions of the execution trace i /∈ {0, 8, . . . , 248}or i /∈ {0, 16, . . . , 496}. Otherwise, the D polynomials get a division by zero. This is thereason why we needed an extended domain, so the verifier can choose from 224 or 480different points. As we pointed out before, the mathematical reason for this limitation willbecome clear in the FRI section. Here, we will simulate this interaction by asking a randomoracle for pseudorandom positions leveraging the Fiat-Shamir heuristic. After getting qpseudorandom positions satisfying the condition above and without repetitions the provercollects the corresponding leaves from the Merkle tree and computes the authenticationpaths for all of them. Here, we can use some optimized technique to avoid redundancy inthe case that some nodes are shared between different paths. We refer to the section aboutcosts to see the difference in proof length between using optimizations and the redundantapproach.

Linear combination

We want to prove that P (X), S(X) (the secret input), D(X) and B(X) are polynomialsof the correct maximum degree. As the FRI step is expensive, we would like to compute

18

an only instance of the protocol instead of many different executions. This is why we willgenerate a random linear combination of evaluations, for what we will use the Merkle treeroot as a seed. It is important to note that the degree of the combined polynomial will havelarger degree.

In this case, we will directly explain the 3-column setting (the 2-column example is easier toobtain this way). Here is where constraint degrees become more influential. So far, we haveseen 3 types of constraints. One describing QG(X), another one describing QH(X) and athird one describing Q∗(X). The two first constraints had degree 3 (as in the 2-columncase), whereas the latter had degree 6.

deg(PG(g · x)− PG(x) · PG(x) · (pi · 3 + (1− pi))

)= dG = 3

deg(PH(g · x)− PH(x) · PH(x) · (si · 7 + (1− si))

)= dH = 3

deg(P∗(x)− PG(x) · PH(x)

)= d∗ = dG + dH = 6

The expressions above show the degree of each constraint. Nonetheless, we are interested inthe degree of the polynomials. The degree of the polynomial is given by the concatenationof the transition function for all steps. This means, deg(Q(X)) = T · dQ. In our case,

deg(QG(X)) = 96 = deg(QH(X)) and deg(Q∗(X)) = 192. Recall D(X) := Q(X)Z(X) , then

deg(D(X)) = deg(Q(X)) − deg(Z(X)). Then, deg(DG(X)) = 64 = deg(DH(X)) anddeg(D∗(X)) = 160 We call the combination degree the largest of these; 160 in the 3-columncase, 64 in the 2-column case.

The idea is: we have to mix PG(X), PH(X), P∗(X), S(X), BG(X), BH(X) and B∗(X)evaluations with DG(X), DH(X) and D∗(X). Recall that P (X), S(X) and B(X) weredegree−T (i.e. degree-32) polynomials, whereas D(X) are degree−T (dQ − 1) (i.e. degree64 and 160) polynomials. This means we have to upgrade the smaller-degree polynomials,to meet the combination degree. In this case, the incremental degree δ for P (X), S(X)and B(X) is 160 − 32 = 128 and for DG(X) and DH(X) it is 160 − 64 = 96. We can dothis by multiplying the evaluations by the E powers of γδ. These powers are a periodicsequence that repeats every lcm(E,deg)

δ . In this case, the P (X), S(X) and B(X) repeat after16 elements and the D(X) repeat after 4 elements. Note that in this case, the other D(X)repeat after e steps because lcm(E , T (dQ− 1)) = lcm(e, dQ− 1), dQ− 1 = 2 and e is alwayseven.

Finally, the prover ends up with the evaluations of the new constraint system on the wholeevaluation domain (E = 512 points), for PG(X), PH(X), P∗(X), S(X), BG(X), BH(X),B∗(X), DG(X), DH(X), D∗(X) and incremented evaluations for all polynomials exceptfor D∗(X). Now, it builds another Merkle tree for the linear combination using these Eevaluations for the content of its leaves. It generates authentication paths for a bunch ofpseudorandom positions, and uses this tree in the low-degree proof that we will explain inthe following section.

19

1.4 FRI

We finally reached the last part of STARKs, which is probably the most interesting construc-tion. In a nutshell, Fast Reed Solomon Interactive Oracle Proofs of Proximity (shortenedto FRI) [1, 3] is a protocol to prove that some points are evaluations of a low degree poly-nomial, and do it fast. It includes a linear time proving algorithm and logarithmic time forthe verifier, in the size of the domain of the polynomial. This is a complex protocol thatrequires some background on polynomials and code theory, but we will try to cover mostof it in this section. For those of you interested in a more detailed and still self-containedexplanation of this protocol, we kindly recommend this write-up by Idan Perl [14].

About polynomials

It is common knowledge that a degree−d polynomial can be uniquely determined byinterpolating (d+1) of its evaluations. If we pick up more points of the same polynomial,still the same expression should be obtained. But what if we want to check this is thecase using significantly less points? Then the result that we get will not be conclusive.For example, if we want to determine that some points lay on a line we will need 2points to get such line, and then check that those points indeed evaluate to the claimedY-coordinates. However, if we only picked 1 point, then we can always make it partof a line, fixing any desired slope. To sum up, the only 100% certain method to solvethis question is to interpolate (d + 1) points in at least O(d log d) time. And this is aproblem when d is too large, as in our setting. Fortunately, we can rely on some tricksof probabilistic testing to be almost certain that the points have the shape we wantusing many less points (in fact, in only O(log d) queries).

Checking two polynomials’ maximum degree

From the above discussion, the direct test takes 2(d+ 1) queries. This technique can beused to verify probabilistically that two polynomials f, g have at most a given degree din half the time for the verifier. The idea will be to perform the direct test on anotherpolynomial that is a linear combination of the two.

The verifier will send the prover some random field element α, which will be usedto compute the linear combination. Then, the prover commits to the evaluation ofh(x) := f(x) + α · g(x), for all x in the domain. Then, if h turns out to have degree≤ d, then both f, g will likely be so. In fact, there’s only 1

F chance that the verifier wasso unlucky that it nullified the coefficient of some nonzero term xD with D > d. Theverifier can perform (d+ 1) queries on h, checking that h has low degree. How can theverifier achieve this if f, g are unknown? It will ask for the evaluations of h, f, g at thesepoints, recompute the linear combination, and check they hold. This has a O(d) costfor the verifier.

P Vα←− α←$ F

com(h(x))∀x ∈ Dom : h(x) −→

20

Split polynomials into smaller degree components

Now we will learn how to split one polynomial of degree ≤ d polynomial into twopolynomials of degree ≤ d/2. Here, we will assume that the degree of f is even, whichcan be done without loss of generality as it is a power of 2 in our setting. This way,proving that f is a low degree polynomial can be done probabilistically by proving thatits components are (halved) low degree. This way, the size of the domain is halved eachtime as well. We rewrite polynomial f splitting even coefficients and odd coefficientsas:

f(x) = f0(x2) + x · f1(x2) , with deg(fi) ≤

d

2

The FRI protocol is a clever protocol that applies these notions to achieve efficiency.

Reed-Solomon codes In broad strokes, a code is the set of vectors (words) that arelinear combinations of the rows of its generator matrix. Depending on the matrix, thesecodes can have some nice properties. One of these is error correcting capability, which isachieved by adding redundancy to the codeword. That is, the code lies in some subspaceof dimension k but its words have length n > k. This way, if the error is small enough, onecan recover the original codeword (but there may not be efficient methods to do this). Thisproperty is affected by the minimum distance, the parameter of the code that determineswhat is the minimum Hamming distance between any two different codewords.

Reed-Solomon (RS) codes are a cool type of codes because they have the largest possibleminimum distance δ = n − k + 1, which allows for great error correcting capability. RScodes, for a certain field, d and x, are defined as the set of evaluations of all polynomials ofmaximum degree d on input x.

RSd(x) := {f(x)|f ∈ F[x][ < d]}

This means, checking lowdegreeness of our ALI linear combination reduces to the problemof deciding whether some points queried by the verifier belong to the Reed Solomon codeof maximum degree d.

1.4.1 Protocol

Finally, the prover runs the FRI protocol on the evaluations of the linear combination. Con-trarily to the examples above, where polynomials were split into two components, the actualSTARK protocol splits them into four, as it provides important performance improvements

21

in practice. Other than that, it follows the same idea. Here, we refer to this step as theDRP (Degree Respecting Projection).

For each iteration of the FRI protocol, the prover works over a 4× smaller sample of values.At some point, there are no many values left and the polynomial remaining is given directlyas a proof. For all other steps, the prover splits all evaluation points and evaluated valuesin 4 groups. Let’s call D the size of the domain at each step (at the beginning, D = E). Inparticular, it builds two 4× D4 matrices where the content is split along the rows. This way,the first D4 elements of each array go to the first row of the matrix, and so. The generalidea, we will build one polynomial for each column, so when evaluated at one column of theevaluation domain X, it outputs that one column in the evaluated values Y.

First, let’s recall one method to obtain the smallest degree polynomial that passes throughsome points. Given d + 1 coordinates (x,y), Lagrange interpolation recovers a degree-dpolynomial that on input xi’s outputs their corresponding yi’s. This polynomial is thefollowing:

L(X) =

d∑i=0

ì(X) · yi such that ì(X) =

d∏j=0j 6=i

X − xjxi − xj

Once the interpolant polynomials are obtained, and back to FRI itself, the prover evaluatesthem all at one random position. Then, it builds a D4 -leaf Merkle tree with these evaluationsand computes authentication paths for all q pseudorandom positions, both for this smallertree and the larger tree for the linear combinations.

Let’s explain all of this in more detail differentiating between two phases just like in theoriginal paper.

Commit phase

The commit phase can be seen as the prover side of the FRI protocol when there is nointeraction. First it builds two matrices, one of them containing the evaluation domain,and the other contains the evaluations of the linear combination (hence, leaves of the linearcombination tree). These D values will be split into four rows of the matrices, wherethe elements ∈ {0, . . . , D4 − 1} go to the first row, {D4 , . . . ,

D2 − 1} go to the second row,

{D2 , . . . , 3D4 − 1} go to the third row and {3D4 , . . . ,

D4 − 1} go to the fourth row.

Then the prover uses Lagrange interpolation to obtain D4 degree-3 polynomials Li(X) (one

per column of the matrix). These polynomials are such that on input the i− th column ofthe X matrix, outputs the i−th column of the Y matrix. Now they agree on a pseudorandompoint x (using the root of the linear combination tree) to be used as a query point on whichthe prover evaluates all these Li(x) polynomials. Now it builds another (smaller) Merkletree with the D

4 evaluations of these polynomials, and starts this step again. Basically,the prover will keep on doing these operations until the number of leaves of the newer

22

Merkle tree reaches a small enough amount. This can be either one, so this is the constantpolynomial, or any other agreed number that gives a good trade-off between the number ofMerkle tree proofs sent and this amount. In any case, once the protocol reaches this finalpoint, the prover sends all leaves remaining directly.

Query phase

The query phase can be seen as the verifier side of the FRI protocol when there is nointeraction. Here, the verifier will go through all the components of the low-degree proof,checking consistency between the trees of consecutive steps. It is the inverse operation ofthe commit phase (sort of saying). More in detail, at each step of the query phase, theverifier checks one of the smaller Merkle trees (oracles) that it received from the prover. Inparticular, it verifies the authentication paths of some pseudorandom positions in the roundtree, and check that they match with the four original points in the previous step. Here,recall that each leaf in the round tree was created from four elements (the interpolation ofthe four elements in the row of the previous round). Once the verifier reaches the last step, itperforms an interpolation to get a low-degree polynomial (either some constant polynomial,or a sufficiently small one as we explained above). Now it queries the interpolant on somepoints, and check that they equal the evaluations of the remainder proof. Due to theSchwartz-Zippel lemma [15, 17], two different degree−d polynomials can agree at most in dpoints, which has negligible probability of happening for large fields d

|F| . This means, if theverifier gets the same evaluations on both polynomials, then it can be certain that they areboth the same polynomial, and the whole proof is verified.

23

Chapter 2

Benchmark

This is probably the most revealing part in this document. Here, we show actual numbersabout real STARKs being executed using the previously mentioned genSTARK library, andtheoretical asymptotic complexities. We will use the two settings for Pedersen commitments,to better understand the impact of each parameter in the overall complexity of the protocol.This includes proving and verification times, proof length, Merkle trees size and RAMconsumption. We also give some words on optimizations and open problems, and discussabout the reliability of STARKs versus SNARKs for a more complex use case, accordingto some realistic estimations. All these experiments have been executed in a MacBook Profrom 2015, running macOS Mojave, with 2.7GHz Intel Core i5 and 16GB 1867 MHz DDR3.

2.1 genSTARK

Ultimately, our goal is to benchmark STARKs for QEDIT’s asset transfer protocol andcompare them with SNARKs. We want to find out the use cases where it is more convenientto use STARKs, and when one should use SNARKs. As of now, QEDIT runs entirely onthe Sapling SNARK proposed by ZCash [9]. We think STARKs may be more efficient forbatch computations where the same verification procedure is repeated for many entries. Inthis part of the document, we will benchmark STARKs for Pedersen commitments usingthe genSTARK JavaScript library.

2.1.1 Installation

The first step to run any genSTARK script is to install the library. Here, we will giveinstructions for macOS, but similar commands apply for other Unix systems as well.

git clone https://github.com/GuildOfWeavers/genSTARK.git

cd genSTARK

24

npm install

gulp

node bin/examples/demo/yourexample.js

If Node.js is not installed on your machine, you will first need to download it from its website[13] as with any other macOS application. This will allow you to use the npm and node

commands to run JavaScript code outside a browser. You can get the latest version using

sudo npm i -g npm

You can use the gulp command the first time you clone the project from GitHub. However,if you are using the same directory to save your own testing scripts, be sure you do not runit again or the directory will be overwritten and they will get erased. You can install gulptyping

sudo npm install gulp-cli -g

Once the library is installed in the genSTARK directory, you can start your own instantiationof STARKs. In the following, we will give an AIR for Pedersen commitments together withsome useful functions to generate STARKs for other computations.

2.1.2 AirScript

This library uses the AirScript module as an interface for the user to easily define constraintsystems. This is the only manual step prior to producing STARKs, and the goal of thispart is to show some auxiliary functions that are compatible with genSTARK that make thisstep more automated. We will be using the same notation that the author explains [10].We positively encourage the reader to pass through this documentation before reading thispart.

Going back to the square-and-multiply example, we will explain how to implement thisalgorithm and Pedersen commitments using the AirScript scripting language. First, recallthis simple exponentiation that we used at the beginning of this document:

Prime field F257

Initial value 1Steps 4Claim 213 ≡ 225 mod 257Exponent [ 1 1 0 1 ]Algorithm SM SM S SM

Computation (((12 · 2)2 · 2)2)2 · 2

25

However, if we solely run the above parameters we get the following execution trace:

r 1 2 8 64

If you perform the square-and-multiply algorithm yourself, you will see that these are thecorrect first steps, but our last step corresponding to 213 is placed in the position [4] of thearray due to the concrete implementation by this library. Here, the next value i+1 of themutable register is computed with the parameters of step i. Then, our claimed value 225would be stored after the number 64, out of the bounds of the array. Also, we cannot simplyadd one more step1, as 5 is not a power of 2. Nor shall we directly start our execution tracewith the base instead of 1, as this assumes that the first bit of the exponent will alwaysbe 1. Instead, we would declare 8 steps, the smallest power of two larger than 4. Note wecannot simply add four zeroes in the most significant bits of the exponent, as we would getthe same problem as before with this execution trace: r = [ 1 1 1 1 1 2 8 64 ], meaning thatwe would be simply postponing the problem. What we can do to circumvent this issue isto use some auxiliary information to stop when we reach our fourth step. Something likean auxiliary constant array k = [ 1 1 1 1 0 0 0 0 ] that is used as a flag inside the transitionfunction out: k*r*r*((2*p) + (1-p));. However, this increases the constraint degree,and thus the combination degree, which is highly undesirable.

Instead, we can shift the exponent in bit form to the left, until the first bit is a 1. Then,we keep track of the number of initial zeroes, as this will be used in the assertion to checkthe last step. This way, we will obtain a complete execution trace to whose position [4]

we can access to build the boundary constraints. Here, 4 = t = T − z = 8− 4, because theexponent 13 =[0 0 0 0 1 1 0 1] has the first four bits set to zero when written with Tbits.

r 1 2 8 64 225 0 0 0

Now we would like to test the library using higher number of steps, and random numbers.Even when we work over a medium-size fields, we can use 256-bit exponents. We will usesome auxiliary functions to help automate the setup definition. In particular, we writea converter from hexadecimal numbers to their binary representation and an automaticprocedure to fill the exponent with zeroes on the right, so there is no need to define k

registers for the flag.

function hex2bin(charhex) {

var binvec = []

for (let c of charhex ) {

if ( c == '0' ) { binvec.push(0n,0n,0n,0n);

} else if ( c == '1') { binvec.push(0n,0n,0n,1n);


1For the kind of fields that we are considering.

26








} else if ( c == 'a') { binvec.push(1n,0n,1n,0n);

} else if ( c == 'b') { binvec.push(1n,0n,1n,1n);

} else if ( c == 'c') { binvec.push(1n,1n,0n,0n);

} else if ( c == 'd') { binvec.push(1n,1n,0n,1n);

} else if ( c == 'e') { binvec.push(1n,1n,1n,0n);

} else if ( c == 'f') { binvec.push(1n,1n,1n,1n);

} else { continue; }

}

return binvec;

}

function fillZeroes(bitArray) {

let i = 0;

while(bitArray[0] == 0n) {

bitArray.shift();

bitArray.push(0n);

i = i + 1;

}

return [bitArray, i];

}

We give the public input (message m) in string form, and it will be converted to a binaryarray of 4× the size of the string (because each hexadecimal character goes to four bits).Plus, we want to generate random exponents that produce no out of bounds exceptions (sowe force the most significant bit to be zero). Also, we get a function that computes theclaim for us, so the only parameters that the user should modify are the number of steps(bits in the exponent), the width of the execution trace, and the desired finite field.

function manyinit(n, col, value) {

let inits = [];

for (let i = 0; i < n; i++)

for (let j = 0; j < col; j++) inits.push(value);

return inits

}

function manyrandomhex(n) {

let many = [];

for (let i = 0; i < n; i++) many.push(onerandomhex());

return many;

27

}

function onerandomhex () {

const chars = ['0','1','2','3','4','5','6','7',

'8','9','a','b','c','d','e','f'];

let hex = ''+chars[Math.floor(Math.random()*4+4)]; // force T-1 bit

for ( let i = 1; i < steps/4; i++)

hex += chars[Math.floor(Math.random()*16)];

return hex;

}

function claim (pInp, pZ, sInp, sZ) {

let result = [];

let g_m, h_r;

if (pInp.length == sInp.length) {

for (let i = 0; i < pInp.length; i++) {

g_m = fieldExpo(field, g, pInp[i], initValues[2*i],

pInp[i].length-pZ[i]);

h_r = fieldExpo(field, h, sInp[i], initValues[2*i+1],

sInp[i].length-sZ[i]);

if (col == 3) result.push([g_m, h_h, mulmod(g_m,h_r,field)]);

else if (col == 2) result.push([g_m,h_r]);

}

}

return result;

}

function mulmod(a,b,field){ return (a*b)-((a*b)/field)*field;}

function fieldExpo (field, base, expo, ini, fin) {

let reg = ini;

for ( let i = 0; i < fin; i++) {

reg = reg*reg*((base*expo[i])+(1n-expo[i]));

if (reg >= field) { reg = reg - (reg/field)*field;}

}

return reg;

}

In the following, we give some functions to automatically generate the string that is passedto the parser. This is especially useful when the number of columns grows. They aredesigned to be compatible with Pedersen commitments, both in the 2-column setting andthe 3-column, where we include the final multiplication between Hr and Gm. For that,we will add a third column per Pedersen hash that is defined as the multiplication of thetwo registers on its left. It is future work to understand if we can reduce the number ofboundary constraints in this case (from 6 to 3), checking the initial values of r0 and r1 andthe claimed result in r2. We refer the reader to the code file that we provide together withthis document to have a ready-to-work script to perform their own tests.

28

function genAIR (field, numP, numS) {

let constraints = ``;

let outtrans = ``;

let outenforce = ``;

let readonlys = ``;

if ( numP == numS ) {

for (let i = 0; i < numP; i++) {

constraints +=

à`+(col*i)+`: $r`+(col*i)+`*$r`+(col*i)+`*((G*$p`+i+`)+(1-$p`+i+`));

a`+(col*i+1)+`: $r`+(col*i+1)+`*$r`+(col*i+1)+`*((H*$s`+i+`)+(1-$s`+i+`)); `;

outtrans += à`+(col*i)+`, a`+(col*i+1)+`, `;

outenforce += `$n`+(col*i)+`-a`+(col*i)+`, $n`+(col*i+1)+`-a`+(col*i+1)+`, `;

readonlys +=

`$p`+i+`: repeat binary [...];

$s`+i+`: repeat binary [...];

`;

if (col == 3) {

constraints += à`+(col*i+2)+`: a`+(col*i)+`*a`+(col*i+1)+`;

`;

outtrans += à`+(col*i+2)+`, `;

outenforce += `$n`+(col*i+2)+`-a`+(col*i+2)+`, `;

}

}

}

outtrans = outtrans.substring(0, outtrans.length - 2);

outenforce = outenforce.substring(0, outenforce.length - 2);

return `define Demo over prime field (`+field+`) {

g: `+g+`;

h: `+h+`;

transition `+(col*numP)+` registers in ${steps} steps {

`+constraints+òut: [`+outtrans+`];

}

enforce `+(col*numP)+` constraints {

`+constraints+òut: [`+outenforce+`];

}

using `+(numP+numS)+` readonly registers {

`+readonlys+`

}

}`;

}

function formatInput(pHex,sHex) {

var [pInp, pZ, sInp, sZ] = [[],[],[],[]];

29

for (let p = 0; p < pHex.length; p++) {

let pub = fillZeroes(hex2bin(pHex[p]));

pInp.push(pub[0]);

pZ.push(pub[1]);

}

for (let s = 0; s < sHex.length; s++) {

let sec = fillZeroes(hex2bin(sHex[s]));

sInp.push(sec[0]);

sZ.push(sec[1]);

}

return [pInp, pZ, sInp, sZ];

}

function genAssertions(pInp, pZ, sInp, sZ) {

var assertions = [];

if (pInp.length == sInp.length) {

for (let i = 0; i < pInp.length; i++) {

let lastep = Math.min(pInp[i].length-pZ[i], sInp[i].length-sZ[i]);

if ( col == 3) {

assertions.push(

{ step: 0, register: col*i , value: initValues[col*i]},

{ step: 0, register: col*i+1, value: initValues[col*i+1]},


{ step: pInp[i].length-pZ[i], register: col*i, value: result[i][0] },

{ step: sInp[i].length-sZ[i], register: col*i+1, value: result[i][1] },

{ step: lastep, register: col*i+2, value: result[i][2]});

} else if ( col == 2 ) {

assertions.push(

{ step: 0, register: col*i , value: initValues[col*i]},


{ step: pInp[i].length-pZ[i], register: col*i, value: result[i][0]},

{ step: sInp[i].length-sZ[i], register: col*i+1, value: result[i][1]});

}

}

}

return assertions;

}

Before trying to execute the above for larger instances, note the maximum default number ofregisters in genSTARKis 64. We have to modify the default parameters describing the limitsof our AIR, and modify the lexer to be compatible with higher indices for the registers.Also, we may reach a point where the Node.js process runs out of memory. We can resizethe default 1GB stack the next time we run the process by typing (in megabytes):

node --max-old-space-size=4096 ./yourtest.js

30

const DEFAULT_LIMITS = {

maxSteps: 2 ** 20,

maxMutableRegisters: 4096,

maxReadonlyRegisters: 4096,

maxConstraintCount: 1024,

maxConstraintDegree: 16,

maxExtensionFactor: 32

};

exports.IntegerLiteral = chevrotain_1.createToken(

{ name: "IntegerLiteral", pattern: /0|[1-9]\d*/ });

exports.MutableRegister = chevrotain_1.createToken(

{ name: "MutableRegister", pattern: /\$[rn]\d{1,4}/ });

exports.StaticRegister = chevrotain_1.createToken(

{ name: "StaticRegister", pattern: /\$k\d{1,4}/ });

exports.SecretRegister = chevrotain_1.createToken(

{ name: "SecretRegister", pattern: /\$s\d{1,4}/ });

exports.PublicRegister = chevrotain_1.createToken(

{ name: "PublicRegister", pattern: /\$p\d{1,4}/ });

exports.Identifier = chevrotain_1.createToken(

{ name: "Identifier", pattern: /[a-zA-Z]\w*/ });

2.2 Complexity

In this section, we show the theoretical complexity of each part of the protocol. We givecosts for proof length, and Merkle tree memory consumption. First, we recall the mostrelevant parameters

Parameter

Number of steps TState width wMaximum constraint degree µ

Extension factor e = 2dlog(2µ)e

Evaluation domain size E = T · eNumber of constraints c ≈ wNumber of assertions a ≈ 2wNumber of evaluation queries q = min(80, E − T )Total number of registers R = p+ s+ k

2.2.1 genSTARK Prover

The STARK prover runs in asymptotic quasilinear time in the number of states and the sizeof the evaluation domain, which in turn depends linearly on the number of steps and the

31

maximum degree of the constraints. In this section, we elaborate on the concrete theoreticalcosts of each step of its computation.

The most time-consuming parts are computing the root of unity of the evaluation group inthe setup of evaluation context, multipoint evaluation and interpolation in the low-degreeextension step, and evaluate the boundary constraints. Here, we will skip constant-timesteps for readability. We refer to the Appendix A for an in-depth table about the provercosts. Note that all element-wise operations have an intrinsic cost of log |F|, the complexityto process all of their bits.

Setup evaluation context

This is a setup step where the prover essentially creates the context. The AirScript functionvalidateInputs checks the correct format of all the public and secret inputs, in timeO(T · (p+ s)). In order to obtain the root of unity, the algorithm field.getRootOfUnity

checks for all elements in the field until it finds the root of unity of order the size ofthe evaluation domain, which costs O(log |F|) where log |F| is the cost of performing the

exponentiation γ = i|F|E for i < |F|. Then the function field.getPowerCycle computes the

whole evaluation domain in O(E) time. Then it computes the execution domain in O(T )time and builds all registers (public, secret and constant) in O(T · (p+ s+ k).

Execution trace

In this part, the prover validates the initial values and sets the first state of the executiontrace to these, all in O(2w) time.

Low-degree extension

This is a very costly step of the whole STARK prover computation. For each columnw in the execution trace, the prover interpolates the polynomial on the T roots usingFFT, and evaluates them on the whole evaluation domain. This means a total cost ofO(w(T log T + E log E).

Constraint polynomials

Now the prover computes the constraint polynomials that nullify on the T roots. It validatesthe correct type of the states in O(w) time, then it initializes one evaluation arrays perconstraint in O(c) time, and finally it evaluates the constraints on the whole evaluationdomain. Each of the E evaluations consists of w memory accesses for the columns in theexecution trace and R accesses to obtain the current values of the read-only registers. Then

32

it evaluates the constraints, whose cost is upper bounded by the maximum degree of theconstraints. Then, for all constraints, it outputs the evaluation of the Q(X) polynomials.

Compute zero polynomial

This is a simple part where the prover evaluates the zero polynomial on the whole evaluationdomain in O(E) time.

Compute divisor polynomial

First, the prover inverts the numerators of the zero polynomial using air.field.invMany.This function goes through the E numerators twice. Then, it computes the evaluationsof D(X) at the whole evaluation domain and for all states in O(w + wE) time usingfield.mulMany.

Compute boundary constraints

This step represents a major bottleneck of STARKs as its complexity limits the amountof boundary constraints of our system. Here, we call the boundary constraints assertions.The number of assertions increases linearly in the number of states. Normally, there’stwo constraints per state: checking the initial value, and the final step. First, for allassertions it will combine the states for each register to create the boundary constraints.This computation carries an exponentiation of the root of unity to the value of the pertinentstep in the extended trace. This means, at most E , so the cost of this constructor step isabout O(a log E). But the costly part is the evaluation on these constraints. Here, for each ofthe a assertions, the prover performs some multipoint evaluations at the whole evaluationdomain. In particular, it evaluates the interpolant polynomial and the zero polynomial.Then, it inverts the zero evaluations in O(2E) time. The total cost of evaluateAll isO(2 · a · E(log E + 1)).

Merkle tree of evaluations

This step is not one of the most expensive ones in terms of running time, but it is highlymemory-consuming. In particular, the prover has to build a Merkle tree with E leaves,32 bytes each. Each leaf is computed as the hashed value of the concatenation of theP (X) evaluations and the secret inputs. These are passed on to the hash function in abuffer format. This hash function runs linearly on its input size, which is w + s. Once thefirst level of the Merkle tree is computed, all remaining nodes are built out of two 32-bytevalues. The Merkle tree is computed recursively, at each step with half the number of leavesas the children layer. This computation is asymptotically linear in the number of leaves as∑ E

2 = E . Note this step is constant, regardless the number of states in your computation.

33

Meaning that this cost amortizes thanks to the compressing property of hash functions andthe fact that the size of the evaluation domain is independent of the number of columns inthe execution trace.

Check tree at random points

First, the prover uses a random oracle to obtain q pseudorandom positions that will beused to query the tree, with the limitation that none of these points can be a step of theoriginal execution trace. Then, for all query positions the prover computes the augmentedposition i + e. If any of them is repeated, they are removed from the list of positions, sothe maximum number of points at which the tree will be queried is 2q. Then for each ofthese leaves, the prover computes their authentication path in O(log E) time, as that is thedepth of the tree.

Random linear combination of evaluations

In this step the prover will compute a linear combination of all the relevant polynomialsdescribing the computation so as to perform one single execution of the FRI protocol. First,the prover constructs the object in O(c) time and computes the combination degree whichwill be at least T and at most T (µ − 1). For the constraint groups with smaller extendeddegree than the combination degree, the prover computes the sequence of E powers for theincremental degree. Then, it multiplies D(X) evaluations with this array (cE multiplica-tions). The same goes for P, S,B evaluations ( multiplications). Recall these are matrices ofE rows and c columns. Then it puts dEvaluations and psbEvaluations and their extendedversions together in the array allEvaluations. Then it computes |allEvaluations| ran-dom numbers using field.prng which will be the coefficients for the linear combination.Then it combines them together with field.combineMany in about O((5w+s)E) time. Wewill omit the logarithmic factor of the exponentiation as it is not that signficant in the wholecomplexity of this step. All together, the prover cost to compute the linear combination is2cE + E + (3w+ s) + (3w+ s+ 2w) + (5w+ s)E . Assuming that the number of constraintsis in the order of w, the cost is O(7wE + sE + E + 8w + 2s).

Low-degree proof

In genSTARK, the round polynomials are computed using an optimized version of Lagrangeinterpolation for degree-3 polynomials (recall our matrices have 4 elements per row), inthe method field.interpolateQuarticBatch. Here, we will compute each part of theexpression below, separately:

L(X) =d∑i=0

ì(X) · yi such that ì(X) =d∏j=0j 6=i

X − xjxi − xj

34

The first step, we compute the eq polynomials. These are 4 degree-3 polynomials whoseroots are all the elements in the column, except for the element in that row (note these arethe numerators of the Lagrange polynomials ì):

eqi(X) =3∏j=0j 6=i

(X − xj)

For example, say we are computing this polynomial for the 4-th element in the column, andsay this column has values [a b c d], then we obtain

eq3 = (X − a)(X − b)(X − c) = X3 − (a+ b+ c)X2 + (ab+ ac+ bc)X − abc,

represented in code as the length-4 array eq3 = [ -abc, ab+ac+bc, -a-b-c, 1 ].

Going back to the construction of the interpolant, the second step consists on computingthe following expression for all 4 points in the i-th column (note ì(X) ≡ eqi(X) · 1

eqi(xi)) :

invYi =yi

eqi(xi)= yi

3∏j=0j 6=i

1

xi − xj

Now it is clear to see that the Lagrange interpolant can be expressed as:

L(X) ≡3∑i=0

eqi(X) · invYi

The library computes this polynomial in 4 steps. In particular, it splits each eqi into their4 coefficients and computes this expression for each Xi. Then, the array result containsthe 4 coefficients of the interpolant polynomial.

Now the prover computes another Merkle tree with the evaluations of the random linearcombination created in the previous step. That is, another tree of E leaves. Then, theprover generates authentication paths for all prior q random positions (not the augmentedpositions). Finally, the prover runs the FRI protocol on the evaluations of the linear com-bination.

Let’s call D the size of the domain at each step (at the beginning, D = E), and gets dividedby 4 at each step. This means, the FRI takes at most log4 E rounds. Inside the methodfield.interpolateQuarticBatch, for all columns it performs operations that run linearlyin the number of rows (in this case, 4). Then it evaluates a random point on D4 polynomialsand creates a Merkle tree with its outputs, in linear time. Then it creates authenticationpaths for the q′ queries in a tree of D4 leaves, and paths for 4q′ queries in a tree of E leaves.The default q′ is q

2 , so 4q′ is 2q. Putting all together, the asymptotic proving cost of theFRI protocol is:

35

log4 E∑i=1

(2D + 2q log2 E +

q

2log2D

)=

=2

log4 E∑i=1

E4i

+ 2q

log4 E∑i=1

log2 E +q

2

log4 E∑i=1

log2E4i

=

=2E4·

1−(14

)log4 E1− 1

4

+ 2q log4 E log2 E +q

2

log4 E∑i=1

(log2 E − log2 4i

)=

=2 · E ·1− 1

E3

+ 2q log2 Elog2 E

2+q

2

log22 E2− 2

log4 E∑i=1

i

=

=2E − 1

3+ q log22 E +

q

2

(log22 E

2− log4 E(log4 E + 1)

)=

=2E − 1

3+

5q

4log22 E −

q

2· ((

log2 E2

)2

+log2 E

2) =

=2E − 1

3+

9q

8log22 E −

q log2 E4

2.2.2 genSTARK Verifier

Here, we go through the code of genSTARK to extract the real costs of the verification ofa STARK proof. We will see that the main source of computation overhead for the verifierlies in the verification of the execution trace. We refer to the Appendix A for an in-depthtable about the verifier costs, that can be analysed to check when a STARK proof pays-offin terms of the naive computation.

Naive computation If the verifier wanted to perform the whole computation by himself,he would need to execute the whole execution trace of T ·w elements. We can estimate thecost of computing one step by µ, the maximum degree of the transition constraint. Thatis, we can approximate the naive computation by O(T · w · µ · log |F|). Nonetheless, donot forget that a STARK is not just a protocol for delegation of computation, but ratherprovides a verifiable scheme for a computation that may have secret inputs and thus couldnot be performed my the verifier itself.

Setup evaluation context

This step is quite similar to the prover’s first step. In fact, the verifier runs the samemethod air.createContext except that without secret inputs. It validates inputs (onlypublic ones) and computes the root of unity. Maybe this step could be omitted to makethe prover provide the root of unity along with the proof, as this step is almost quasilinearin the field size. If the input contains spread registers, it also gets the power cycle in O(E)

36

time. then it builds input registers (including both public and constants, but no secretinputs).

Once the context parameters are created, the verifier runs a few different steps in com-parison with the prover. In particular, it creates the object for the B polynomials, thezero polynomial and the linear combination. Creating the boundary constraints runs intime O(a log E), and the linear combination in time O(c). Creating the object for the zeropolynomial requires to perform the exponentiation gE−e what can be done in O(log(E − e))using square-and-multiply.

Random positions for evaluation

Now the verifier generates the same pseudorandom q positions that the prover chose for theevaluation spot-check, by using the same pseudorandom generator and the evaluation treeroot as seed. It also computes the augmented positions, in O(q) time.

Decode evaluation spot checks

In this step the verifier hashes all 2q leaves in proof.values, where each of them is composedof (w+ s) elements. Note the hash function will output 32-byte buffers, and its complexityruns linear in its input size.

Verify evaluation Merkle proof

In this part, the verifier runs MerkleTree.verifyBatch to check the correctness of the 2qauthentication paths of the evaluation tree. The verifier re-computes the paths and checksthat the resulting root equals eRoot sent by the prover as part of the proof. First, itcomputes the parents of the leaves in O(2q) time. Then, for all (d− 1) levels remaining, itcomputes parent nodes using the hashed value previously obtained and the sibling of eachnode (that it gets from the authentication path). Integrity is preserved because eRoot isthe seed for the pseudorandom positions. Note that verifying one single authentication pathtakes logarithmic time in the number of leaves (meaning linear in the depth of the tree).This takes at most (2q + 2q(log E − 1)) = 2q log E time. Here however, the computation ismuch faster than O(2q log E) because nodes get repeated across queries. For simplicity, weuse that upper bound in the table.

Verify low-degree proof

The first part of this verification is to check all components of the ldProof. The numberof components coincides with the number of rounds of the FRI protocol. This will beat most log4 E , but it will normally be less than that as it will not reach the constant

37

polynomial (rather, when there are not many points left, all of them are passed directly tothe proof). Then the verifier generates the q′ ≈ q

2 Y indices within E4i

columns. Then itverifies its authentication paths, included in columnProof using MerkleTree.verifyBatch.Disregarding optimizations, this requires at most O( q2 log2

E4i

) time for each component. ThegenSTARK verifier decodes the indices of the queries for the polynomial in linear time inthe number of positions. Then it uses these positions to verify the tree with the evaluationsof the linear combination. In this case, the tree has E leaves, and it is checked in the above4q′ ≈ 2q positions. Following the same reasoning, this takes naively a cost of O(2q log2 E).Now, the verifier gets back the (supposedly) original X points (domain) and Y evaluationsin linear time in the number of queries. Now it can recompute the q′ chosen degree-3polynomials and evaluate them on the pseudorandom point obtained with lRoot as a seed.Interpolating these points using field.interpolateQuarticBatch costs linear time in thenumber of polynomials to be obtained (here, q′). If any of them does not coincide with theclaimed value, the verifier rejects the proof.

The second part, the verifier checks the remainder and that the Merkle root matches up.For the former, this is a constant time step checking the correct degree of the remainingpolynomial. In case the FRI protocol did not run all of the log4 E rounds, it must haveprovided the remaining coefficients of the polynomial. Note each of these elements can takeat most log2 |F|

8 bytes, but genSTARK has capacity for up to 32-byte elements. In that sense,even small size examples take huge proof lengths that can easily be optimized when the fieldis not so large. Finally, the verifier creates a Merkle tree with the elements in the remainderand checks that the root equals lRoot. Assuming the remainder has a small number ofelements, we can consider this step to require constant time.

As this step is quite complex, we will work on these costs for simplicity. The followingconfirms the costs claimed in the FRI paper.

log4 E∑i=1

(q

2+q

2log2

E

4i+ 2q + 2q log2 E +

q

2+q

2

)=

7

2q log4 E + 2q

log4 E∑i=1

log2 E +q

2

log4 E∑i=1

log2E4i

=7

4q log2 E + 2q log2 E log4 E +

q

2

log4 E∑i=1

(log2 E − log2 4i) =

=7

4q log2 E + q log22 E +

q

2log2 E log4 E −

q

2

log4 E∑i=1

log2 22i =7

4q log2 E +

5

4q log22 E − q

log4 E∑i=1

i =

=7

4q log2 E +

5

4q log22 E − q

log2 E2

(log2 E

2− 1

)=

7

4q log2 E +

5

4q log22 E − q

log22 E4

+ qlog2 E

2

= q log22 E +9

4q log2 E

38

Verify execution trace

This step is used to verify correctness of the execution trace. It will check the transitionfunction, boundary constraints and compute the linear combination (to be checked in thenext step). This check is performed in series, inside a loop iterating over the q positions.For each of these queries, the verifier decodes all parameters corresponding to that positionin the evaluation domain. That is, all evaluations of the w mutable registers P (X), the ssecret inputs and w future registers n (obtained thanks to the augmented positions by theextension factor, i+ e). Then, it evaluates the zero polynomial in constant time.

Now the verifier has all the required components to evaluate the constraints Q(gi) usingair.evaluateConsraintsAt. This function first evaluates constant registers and publicregisters at the given point gi. Then it calls the AIR constraint evaluator. Approximately,this step takes O(k + p + w · m) time. Then it obtains the evaluations of the divisionpolynomials D(gi) in linear time in the state width w.

Then the verifier checks the boundary constraints in linear time in the number of assertions.

Finally, it computes the evaluation of the linear combination at gi. For every constraintgroup (constraints are grouped depending on their degree, so at most there will be c itera-tions), the constraints are adapted to meet the combination degree. Once all are unified, itrecomputes allValues, containing the values for the linear combination. Then it combinesthese values together with the coefficients to get the linear combination. This part runsin linear time in the length of allValues. Recall this vector includes psbEvaluations,dValues and their extended versions for the groups with smaller degree. If each state re-quires two boundary constraints a = 2w (one for initial step and one for last step), thenthe total length is about 2(w + numSec + 2w) + w + w′, where w′ is the size of a subsetof the states. In order to compute the coefficients, it runs a pseudorandom number gen-erator running in linear time in the size of allValues. Assuming c ∼ w, the total cost isO(14w + 3s).

Verify linear combination proof

This is the last step of the verifier. Here, it verifies the q authentication paths for the queriesto the linear combination tree using MerkleTree.verifyBatch. Using the same reasoningas above, and without optimizations, this step takes time O(q log E).

2.2.3 genSTARK Memory

The main source of memory consumption for the prover is the need to store very largeMerkle trees along the STARK execution.

The first tree stores the E evaluations of P (X) and S(X). The required size to store thiswhole Merkle tree will be (2E − 1) · 32 bytes. Later on, as part of the low-degree proof, the

39

prover creates a tree of the linear combination. Again, this tree requires (2E − 1) · 32 bytes.Now, every round of the FRI protocol implies the creation of another (everytime) smallerMerkle tree. In particular, each of these trees have E

4ileaves, so storing all of them requires:

·log4 E∑i=1

E4i

=E4·

1−(14

)log4 E1− 1

4

= E ·1− 1

E3

=E − 1

3leaves

All these leaves take(2(E−1

3

)− 1)× 32 bytes. All together, this means that one STARK

proof requires storage for 14E−113 · 32 bytes in Merkle trees.

Memory consumption Bytes

Evaluation tree 32(2E − 1)Linear combination tree 32(2E − 1)

FRI trees 322E−53

2.2.4 genSTARK Proof

The genSTARK proof has four main components that affect its total length. These are theevaluation results, evaluation proof, linear combination proof and low-degree proof.

The evaluation results proof.values consist of the content of the ∼ 2q leaves of the evalu-ation Merkle tree queried. Recall these leaves are the concatenation of the execution traceand secret inputs. Then, the size of this part of the proof is O(2q(w + s) log |F|8 ) bytes.

Before we continue analysing these costs, let’s check what is the difference between providingmultiple authentication paths in a Merkle tree and giving them without repeated nodes.Naively, q authentication paths in a Merkle tree of E leaves require q log E nodes. However,as we go up in the levels of the tree, we will start seeing repetitions. Meaning that at somepoint, two authentication paths for different leaves start having the same prefix. In average,this is the general idea:

Layer Repetitions

Root layer −(q − 1)2 nodes −(q − 2)4 nodes −(q − 4). . . . . .

2blog qc nodes −(q − 2blog qc)

Note that the q nodes in the lower level are not provided here, but in the proof.values

part. If we do the math to remove repeated nodes, we obtain the following estimation.

q log E −blog qc∑i=0

(q− 2i

)− q = q log E − qblog qc+

blog qc∑i=0

2i = q(log E − blog qc) + 2 · 2blog qc − 1

40

≈ q log E − (q log q − 2q + 1) = q(logEq

+ 2)− 1

You can get to this result thinking of different paths that are provided naively until theyreach the level with blog qc nodes. From that point on, the remaining peak of the tree isgiven entirely.

q log E − q log q + 2 · 2blog qc − 1 ≈ q log E − (q log q − 2q + 1) = q(logEq

+ 2)− 1

The evaluation proof proof.evProof provides authentication paths for the ∼ 2q queries tothe evaluation tree. Disregarding repeated nodes, the number of elements in one authenti-cation path of a E-leaf tree is ∼ log E . In our case, all intermediate nodes are 32-byte length.If we optimize authentication paths without repetition, we can go from 2q log E nodes to2q log E2q + 2(2q)− 1.

Similarly, each authentication path in the linear combination proof proof.lcProof requiresabout log E nodes. Here, the difference is the number of positions queried: q instead of ∼ 2q.If we give the proof without repeated nodes, we will need 32q log Eq + 64q − 32 bytes.

Each iteration of the FRI protocol adds one component results.component to the low-degree proof proof.ldProof. As previously analysed, each iteration builds a new Merkletree of E

4ileaves and provides q′ ∼ q

2 authentication paths on it. It also gives 4q′ ∼ 2qauthentication paths on the linear combination tree of E leaves. This means that this partof the proof has this number of nodes:

log4 E∑i=1

(4q′ log2 E + q′ log2D

)=

log4 E∑i=1

(2q log2 E +

q

2log2D

)=

=2q

log4 E∑i=1

log2 E +q

2

log4 E∑i=1

log2E4i

= 2q log4 E log2 E +q

2

log4 E∑i=1

(log2 E − log2 4i

)=

=2q log2 Elog2 E

2+q

2

log22 E2− 2

log4 E∑i=1

i

= q log22 E +q

2

(log22 E

2− log4 E(log4 E + 1)

)=

=5q

4log22 E −

q

2· ((

log2 E2

)2

+log2 E

2) =

9q

8log22 E −

q log2 E4

≡ 9

4q′ log22 E −

1

2q′ log2 E

If we optimize authentication paths without repetition, we can reduce the number of nodes:

41

log4 E∑i=1

(2q log2

E2q

+ 4q − 1 +q

2log2

E4i · q2

+ q − 1)

=

=

log4 E∑i=1

(2q log2 E +

q

2log2

E4i)−

log4 E∑i=1

(2q log2 2q − 4q + 1 +

q

2log2

q

2− q + 1

)=

=9q

8log22 E −

q log2 E4

− log4 E(2q log2 2q − 4q + 1 +q

2log2

q

2− q + 1) =

=9q

8log22 E −

q log2 E4

− log2 E(q log2 2q +

q

4log2

q

2− 5

2q + 1

)=

=9q

8log22 E −

q log2 E4

− log2 E(q + q log2 q +

q

4log2 q −

q

4− 5

2q + 1

)=

=9q

8log22 E −

q log2 E4

− log2 E(5

4q log2 q −

7

4q + 1) = log2 E

(9

8q log2 E −

5

4q log2 q +

3

2q − 1

)

What is more, if the FRI protocol stops when it reaches a small enough number of columnsleft (in the default genSTARK, ` = 256), the proof length is even smaller:

log4E`∑

i=1

(4q′ log2

E4q′

+ 8q′ − 1 + q′ log2E/4i

q′+ 2q′ − 1

)+ ` =

=

log4E`∑

i=1

(2q log2

E2q

+ 4q − 1 +q

2log2

E/4i

q/2+ q − 1

)+ ` =

= log2 E(

9

8q log2 E −

5

4q log2 q +

3

2q − 1

)−

log4 `∑i=1

(2q log2

E2q

+ 5q − 2q

2log2

E/4i

q/2

)+ ` =

= (up)− log `

2

(2q log2

E2q

+ 5q − 2 +q

2log2

2Eq

)− q

2

log4 `∑i=1

log2 4i + ` =

= (up)− log2 `

2

(5

2q log2 E −

5

2q log2 q +

7

2q − 2

)+ `− q

2log4 `(log4 `+ 1) =

= (up)− log2 `

2

(5

2q log2 E −

5

2q log2 q +

7

2q − 2

)+ `− (

q

8log22 `+

q

4log2 `) =

= (up)− log2 `

2

(5

2q log2 E −

5

2q log2 q + 3q − 2 +

q

4log2 `

)+ `

However, our experiments show that genSTARK includes a more optimized method to com-pute paths. In any case, the following estimation will is in the same order of that ingenSTARK.

42

Proof length Bytes

Evaluation results q(w + s) log |F|4

Evaluation proof 32(2q log E2q + 4q − 1)

Linear combination proof 32(q log Eq + 2q − 1)

Low-degree proof (full) log2 E (36q log2 E − 40q log2 q + 48q − 32)Low-degree proof (up to 256) log2 E (36q log2 E − 40q log2 q + 48q − 32) + 32`

− log2 ` (40q log2 E + 40q log2 q + 48q − 32− 4q log2 `)

2.3 Experiments

This part of the document provides a collection of experiments that can be used to concludethe effect of the most relevant parameters in practice, and infer estimations for other usecases. In what follows, we use this colour scheme: gray refers to 3-column setting with 3assertions, cyan is for 3-column with 6 assertions, blue goes to 2-column with 4 assertions,and black goes to 1-column single exponentiation with 2 assertions.

2.3.1 Number of steps

The number of computation steps T is a major contributor to the cost of STARKs, bothin proving and verification time, proof length, Merkle trees size, and RAM consumption.The verifier complexity is linear in the number of steps (here, the number of bits of theexponent). The prover complexity grows quasilinearly in the size of the evaluation domain,and thus in the number of steps because E = T · e. The proof length grows logarithmicallyin the number of steps. Both the Merkle tree size and RAM grow linearly in the number ofrows of the execution trace. Note that the proof length and verifier time for both settingshave parallel plots because T only affects these measures additively, considering these twoscenarios. The graphs below show the verification and proving times for 1024 Pedersencommitments in a 32-bit order field, 80 queries, and 3 column setting (with 3 boundaryconstraints) versus 2 column setting (with 4 boundary constraints):

32 64 128 256

0

50

100

150

200

Number of steps

seco

nd

s

Prover

3 col2 col

32 64 128 256

0.4

0.6

0.8

1

·104

Number of steps

mil

isec

ond

s

Verifier

3 col2 col

43

32 64 128 256

0

1,000

2,000

Number of steps

kil

obyte

s

Proof ◦ and Merkle trees 4

32 64 128 2560

1,000

2,000

3,000

Number of steps

meg

abyte

s

RAM

3 col2 col

2.3.2 State width

The state width w defines the number of Pedersen commitments that are verified by theSTARK. Now, we analyse the impact of this parameter, regarding the total number ofPedersen commitments and number of execution trace columns per commitment (3 columnsetting VS. 2 column). The verifier complexity is linear in the state width. Experimentsreveal that the genSTARK prover runs in linear time in the number of states, but with largerslope in comparison to the verifier’s. The proof length and total memory consumption alsogrow linearly in the state width. Conversely, the Merkle tree length is independent of thenumber of columns of the execution trace. Plus, it shows a parallel graph, meaning thatw only affects |π| additively considering these two scenarios. The graphs below show theverification and proving times for 20 to 210 256-bit exponent Pedersen commitments in a32-bit order field, with 80 queries, and 3 column setting (with 3 boundary constraints)versus 2 column setting (with 4 boundary constraints):

0 1,000 2,000 3,000

0

100

200

State width

seco

nd

s

Prover

a = 3a = 4

0 1,000 2,000 3,000

0

0.5

1

·104

State width

mil

isec

on

ds

Verifier

a = 3a = 4

44

0 1,000 2,000 3,000

0

1,000

2,000

State width

kil

obyte

s


0 1,000 2,000 3,000

0

1,000

2,000

3,000

State width

meg

abyte

s

RAM

a = 3a = 4

2.3.3 Number of Pedersen commitments

The graphs below show the effect of the number of Pedersen commitments on the STARK.The numbers beneath are the same as in the case of state width, except that here we dividethe X-axis by 3 in the 3-column setting, and by 2 in the 2-column setting. This means thatboth prover and verifier run linearly in the number of Pedersen commitments, and bothproof length and memory consumption grow linearly in this parameter as well. Conversely,the Merkle tree size is independent of the number of commitments verified. Note that theverifier time barely changes from the 3-column to the 2-column setting. The graphs belowshow these expermients for 20 to 210 256-bit exponent Pedersen commitments in a 32-bitorder field, with 80 queries, and 3 column setting (with 3 boundary constraints) versus 2column setting (with 4 boundary constraints):

0 200 400 600 800 1,000

0

100

200

Number of commitments

seco

nd

s

Prover

a = 3a = 4

0 200 400 600 800 1,000

0

0.5

1

·104


mil

isec

ond

s

Verifier

a = 3a = 4

45

0 200 400 600 800 1,000

0

1,000

2,000


kil

obyte

s


0 200 400 600 800 1,000

0

1,000

2,000

3,000


meg

abyte

s

RAM

a = 3a = 4

2.3.4 Number of boundary constraints

Now we compare two versions of the Pedersen commitment AIR using 3 columns: one withnaive number of boundary constraints a = 6, and another one with a = 3 where the onlyconstraits were set on the first step of the G and H columns, and the third one refers to thefinal step of the multiplication column. The effect on soundness is yet to be analysed, but wewanted to test before if it made any sense at all to put efforts to reduce these numbers. Theconclusion is, the proof length, Merkle tree size and RAM are independent of the number ofassertions. The verifier time only depends slightly on a, whereas proving time receives mostof the impact in the step building and evaluating the boundary constraints, linearly in thenumber of assertions. Here we show these results for 20 to 210 256-bit exponent Pedersencommitments in a 32-bit order field, with 80 queries, and 3 column setting (with 3 versus 6assertions):

0 200 400 600 800 1,000

0

100

200

300


seco

nd

s

Prover

a = 3a = 6

0 200 400 600 800 1,000

0

0.5

1

·104


mil

isec

on

ds

Verifier

a = 3a = 6

46

0 200 400 600 800 1,000

0

1,000

2,000


kil

obyte

s


0 200 400 600 800 1,000

0

1,000

2,000

3,000


meg

abyte

s

RAM

a = 3a = 6

2.3.5 Number of queries

The number of queries q affects some other parameters, such as the number of augmentedpositions on which the evaluation tree is checked ≈ 2q, the number of queries to the low-degree Merkle tree in each step of the FRI protocol q′ ∼ q

2 and the number of queriesto the linear combination tree 4q′ ∼ 2q. The total Merkle tree size is constant for differ-ent values of q. Both verification time and proof length grow linearly in the number ofqueries, whereas the time for the prover remains virtually unaltered regardless the numberof queries. Also, the RAM barely changes with q. Notice how clearly we can check thatboth the prover and memory complexity are proportional to the evaluation domain sizeE . The graphs below show the verification and proving times, memory requirements andcommunication complexity, for 1024 256-bit exponent Pedersen commitments in a 32-bitorder field, and 3 column setting (with 3 boundary constraints) versus 2 column setting(with 4 boundary constraints). The same goes for RAM consumption, except that this onerefers to 32 Pedersen commitments (due to very costly evaluation). Note that performinga small number of queries is not a good idea for soundness, but we made this experimentsolely for testing reasons. Similarly, it barely makes sense to perform so many queries inthe 32 commitments example, as it samples half of the evaluation domain.

47

16 32 64 12850

100

150

200

Number of queries

seco

nd

s

Prover

16 32 64 128

5

10

15

Number of queries

seco

nd

s

Verifier

3 col2 col

16 32 64 1280

2,000

4,000

Number of queries

kil

obyte

s


16 32 64 128

40

60

80

100

120

Number of queries

meg

abyte

s

RAM (32 commit)

2.3.6 Field size

The field size |F| affects the size of each element in the execution trace, and computationwithin larger fields take longer time to be processed. This has an impact in running time andmemory as we will see now. Both prover and verifier run in polylogarithmic/quasilinear-AQtime in the number of bits needed to represent the field. The Merkle trees have constantsize, as they take the same amount of space regardless the length of the content of theleaves. The proof length grows perfectly linear in the number of bits needed, so O(log |F|).The RAM consumption also grows linear in the number of bits needed. The followinggraphs show the verification and proving times, memory requirements and communicationcomplexity, where the X-axis shows the logarithm of the field size. We benchmark 32 256-bit exponent Pedersen commitments, in the 3 column setting with 6 boundary constraintsversus 2 column setting with 4 boundary constraints, with different fields of 32, 64, 128 and256 bit order size.

48

32 64 128 256

5

10

15

Field order bits

seco

nd

s

Prover

a = 6a = 4

32 64 128 256

0.5

1

1.5

2

Field order bits

seco

nd

s

Verifier

a = 6a = 4

32 64 128 256

200

400

600

Field order bits

kil

obyte

s


32 64 128 256

50

100

150

Field order bits

meg

abyte

s

RAM

2.3.7 Constraint degree

The maximum constraint degree will determine the extension factor e = 2 · 2dlog2 µe, andthus the evaluation size domain E = T · e, in a similar way as the number of steps do. Forthis reason, Merkle trees in the 2-column setting (with µ = 3 and e = 8) take half the spaceof those in the 3-column setting (with µ = 6 and e = 16).

2.4 Discussion

In this section we present some ideas that could be used to optimze implementations ofSTARKs and some research oriented questions whose answers are left for future work. Wealso give our motivation to benchmark Pedersen commitments and present some of theother use cases we considered as well.

49

2.4.1 Trade-off column VS row

Now we address an interesting question. Whether it is is better to add more columns of Tsteps, or having less columns with more rows. Informally, having more rows will require alarger evaluation domain, so all polynomial interpolations and multipoint evaluations willtake considerably longer time. Plus, having more columns is less convenient for the verifier.Here, we will compare two exponentiations (the 2-column setting with 4 assertions) withone single exponentiation of twice the exponent length (1-column setting with 2 assertions).Note that here, we modified the 2-column setting to use only public inputs, so the compari-son is as fair as possible. Recalling the square-and-multiply algorithm, note that performingone exponentiation with n bits in the exponent requires the same number of operations astwo exponentiations with n

2 exponent bits each. However, as we are about to check, thesame does not apply to STARKs. The graphs below show that having less columns with alarger number of rows is inconvenient for the prover, whereas for the verifier, having morecolumns is less desirable. The RAM needed for a single column is larger than that of themultiple columns scenario, as well as the proof length. Since the evaluation domain size istwice larger in the single column setting, the Merkle tree size is twice as large as well. Here,we are considering only one iteration, 80 queries, field size of 256 bit, and total number ofbits ranging from 64 to 512. In general, it is not recommended to increase the number ofsteps. You may better try to have more columns instead.

64 128 256 512

100

200

300

400

Total number of bits

mil

isec

ond

s

Prover

1 col2 col

64 128 256 512

90

100

110

120

130


mil

isec

ond

s

Verifier

50

64 128 256 512

0

200

400

600


kil

obyte

s


64 128 256 512

5

10

15


meg

abyte

s

RAM

1 col2 col

2.4.2 STARK vs SNARK

We tried to have the closest apples-to-apples comparison between STARKs and SNARKsby creating Pedersen commitments over the Jubjub2 elliptic curve that QEDIT is currentlyusing as part of Sapling. Nonetheless, the order of the multiplicative subgroup of this field isnot a large power of two, nor in BLS12-3813. Note that the latter curve has a subgroup thatis a multiple of a large power of two4, but with way smaller order than the field beneath.We think that it is an interesting improvement to make STARKs work with multiplicativesubgroups other than F∗, to make them compatible with plenty of constructions in the(more studied) SNARK world.

Now we address the question of whether it a STARK is faster than a SNARK. Here we arebuilding schemes for exponentiations, with 256-bit field elements. We will use the 1-columnsetting and compare against an instantiation of a SNARK that runs over Groth’16 [8] in theJubJub elliptic curve. Please, take into account that we could not make an apples-to-applescomparison.

2The subgroup in Jubjub has order 65544843968907738099309675635232457297059212658723172813653591623 . . .. . . 92183254199 with |GJubjub| − 1 = 2 · 3 · 12281 · 1710050753150114629 · 203928654140967434528233 ·255074062430788457494141376149

3The finite field in BLS12-381 has order 400240955522166739341778982573590415655688281993900788533205 . . .. . . 8136124031650490837864442687629129015664037894272559787 and |FBLS12−381| −1 = 2 · 32 · 11 · 23 · 47 · 10177 · 859267 · 52437899 · 2584487767265781317813 ·15778400344354997994418419698270088123916926905054652752758194827714659

4Its order is 232 · 3 · 11 · 19 · 10177 · 125527 · 859267 · 9063492 · 2508409 · 2529403 · 52437899 · 2547602932

51

0 200 400 600 800 1,000

0

100

200

Number of commitmentsse

con

ds

Prover

SNARKSTARK

2.4.3 Optimizations

This section gives some hints on optimization techniques to make genSTARK or any otherimplementation more efficient. Take into account that the library that we have been usingto benchmark our examples is written in JavaScript. This means that having the samefunctionalities in a more low-level language such as C would result in a faster protocol in ablink of an eye.

First, we want to remark that a major source of direct optimization is the AIR definitionitself. Even in the simple example of Pedersen commitments we had to decide whetherto add one third column per commitment to represent the final multiplication of the twocolumns preceding, or increase the constraint degree by adding a flag to detect final step toperform the multiplication. Depending on the kind of computation, we could even createmore complex boundary constraints to reduce its number, combining multiple columnstogether.

At the beginning of the computation, and as part of the transparency of this protocol,both prover and verifier obtain the roots of unity to be used across computations, whichhave the same value when the field order and T and e parameters are preserved. Then,we could think of a way of including these values as part of the public statement (whichmay contradict the transparent property of STARKs) or the proof (which would increaseits length by a few more bytes) to avoid the need of recomputing them every time (whenthe computation is regular enough).

Regarding proof length, and this is something that genSTARK already implements, multipleauthentication paths to the same Merkle tree can be given without repeated nodes. Naivelygiving q paths in a E-leaf tree takes 32q log E bytes. However, if we implement this opti-mization, the number of nodes needed can get reduced to q log E − q log q+ 2 · 2blog qc − 1 =q log Eq + 2q − 1. That is, on average, each level of the tree contributes to q different nodesuntil the width of one level is smaller than q. At this point, all nodes in the level are providedonly once. Meaning that the peak of the tree with 2 · 2blog qc− 1 nodes is given entirely. The

52

difference between the rough estimation and the optimized form is 32(q log q − 2 · 2blog qc)less bytes.

The current implementation of the low-degree proof in genSTARK halts the loop when thenumber of columns remaining is small enough and provides these coefficients directly in theproof. This is a trade-off between computation and proof length. However, these elementsare stored in buffers of 32-bytes each, regardless the needed bytes to represent elements inthe field. In that sense, even small size examples take huge proof lengths that can easily beoptimized when the field is not so large.

We are wondering whether there could be different ways of defining the linear combination,other than upgrading each component with the incremental degree. This would reduce theamount of operations required to compute this object, that takes a considerably long provertime.

Another important source of optimization, from the research side, could be to think of otherdata structures to base STARKs upon, other than Merkle trees that take up most of thememory consumption.

Plus, the FRI complexity is surprisingly not the costliest part of this protocol, at all. Instead,most quasilinear factors for the prover come from the fact that it must compute plentyof multipoint evaluations and polynomial interpolation. These two algorithms have beenthoroughly studied in the literature. What we could do is try to think of more efficient waysto work with execution traces so that there is no need to perform so many computations.

We wanted to test STARKs for Pedersen commitments as they are a key tool to verifiy cor-rectness in the blockchain. All transactions are part of a large Merkle tree and membershipproofs are given as the authentication paths, just like we did for STARKs themselves. Inpractice, these trees have a depth of about 30 levels, meaning that a whole proof consistson 30 authentication paths. The way to implement this with STARKs is by building thewidth-30 execution trace for Pedersen commitments, where each column refers to one nodein the authentication path. Using unoptimized tests (standard JavaScript genSTARK, andnaive 6 boundary conditions with redundant third column), and for for 32 Pedersen com-mitments in a 256-bit field, we obtained a STARK proving time of about 15 seconds, 1.8seconds for the verifier, 705 KB proof, 597 KB Merkle tree, and 179 MB of RAM.

53

Appendix A

Theoretical complexity

54

1. Setup evaluation context1.1. air.createContext1.1.1. validateInputs O(T · p)1.1.2. field.getRootOfUnity O(log |F|)1.1.3. field.getPowerCycle O(E)1.1.4. buildInputRegisters O(T · (p+ k))1.2. Create objects1.2.1. Boundary constraints O(a log E)1.2.2. Zero polynomial O(log(E − e))1.2.3. Linear combination O(c)

2. Random positions for evaluation2.1. Get positions O(q)2.2. Get augmented positions O(q)

3. Decode evaluation spot checks3.1. Hash evaluations O(2q · (w + s))

4. Verify evaluation Merkle proof4.1. verifyBatch O(2q log E)

5. Verify low-degree proof5.1. Verify recursive components O(q log2 E + 9

4q log E)5.1.1. Pseudo random indices O

( q2

)5.1.2. verifyBatch O

( q2 log E

4i

)5.1.3. Decode positions in polynomial O(2q)5.1.4. Verify linear combination tree O(2q log E)5.1.5. Recover claimed Xs and Ys O

( q2

)5.1.6. interpolateQuarticBatch O( q2)5.2. Final checks O(1)

6. Verify execution trace O(q · (6.1 + 6.2 + 6.3))6.1. Verify transition O(k + p+ w · µ+ w)6.2. Verify boundary O(a)6.3. Linear combination O(14w + 3s)

7. Verify linear combination proof7.1. verifyBatch O(q log E)

Table A.1: Theoretical complexity of genSTARK verifier

55

1. Setup evaluation context1.1. air.createContext1.1.1. validateInputs O(T · (p+ s))1.1.2. field.getRootOfUnity O(log |F|)1.1.3. field.getPowerCycle O(E)1.1.4. Execution domain O(T )1.1.5. buildInputRegisters O(T ·R)

2. Execution trace2.1. Validate InitValues O(2 · w)2.2. Transition function: O(T · (R+ numN + µ))

3. Low-degree extend P (X)3.1. evaluate O(w · (3.1.1 + 3.1.2))3.1.1. interpolateRoots O(T log T )3.1.2. evalPolyAtRoots O(E log E)

4. Constraint polynomials4.1. Validate extended trace O(w)4.2. Initialize evaluation arrays O(c)4.3. Evaluate constraints O(E(w +R+ µ+ c))

5. Compute Z(X)5.1. evaluateAll O(E)

6. Compute D(X)6.1. Invert numerators of Z(X) O(2E)6.2. Multiply num−1 ∗ den ∗Q O(w(E + 1))

7. Boundary constraints7.1. Constructor O(a · (log E+))7.2. evaluateAll O(2 · a · E(log E + 1))

8. Merkle tree of evaluations8.1. Serialize evaluations of P (X) and S(X) O(2 · E · (w + s))8.2. Build Merkle tree O(E)

9. Check tree at random points9.1. Get pseudorandom query positions O(q)9.2. getAugmentedPositions O(q)9.3. Get eValues O(2q)9.4. Generate authentication paths O(2q log E)

10. Random linear combination of evaluations10.1. Construct object O(c)10.2. Compute linear combination O(7wE + sE + E + 8w + 2s)

11. Low-degree proof11.1. Create tree of linear combination and paths O(E + q log E)11.2. FRI protocol O(23(E − 1) + 9

8q log2 E − 14q log2 E)

11.2.1. interpolateQuarticBatch O(D)11.2.2. Generate proof O(D + 2q log E + q

2 logD)

Table A.2: Theoretical complexity of genSTARK prover

56

Bibliography

[1] Ben-Sasson, E., Bentov, I., Horesh, Y., and Riabzev, M. Fast Reed-SolomonInteractive Oracle Proofs of Proximity. Electronic Colloquium on Computational Com-plexity (ECCC) 24 (2017), 134.

[2] Ben-Sasson, E., Bentov, I., Horesh, Y., and Riabzev, M. Scalable, transparent,and post-quantum secure computational integrity. Cryptology ePrint Archive, Report2018/046, 2018. https://eprint.iacr.org/2018/046.

[3] Ben-Sasson, E., Chiesa, A., Gabizon, A., Riabzev, M., and Spooner, N.Interactive oracle proofs with constant rate and query complexity. Cryptology ePrintArchive, Report 2016/324, 2016. https://eprint.iacr.org/2016/324.

[4] Ben-Sasson, E., Chiesa, A., and Spooner, N. Interactive Oracle Proofs. In Theoryof Cryptography (Berlin, Heidelberg, 2016), M. Hirt and A. Smith, Eds., Springer BerlinHeidelberg, pp. 31–60.

[5] Buterin, V. STARKs, Part 3: Into the Weeds. https://vitalik.ca/general/

2018/07/21/starks_part_3.html.

[6] Buterin, V. STARKs, Part I: Proofs and Polynomials. https://vitalik.ca/

general/2017/11/09/starks_part_1.html.

[7] Buterin, V. STARKs, Part II: Thank Goodness It’s FRI-day. https://vitalik.

ca/general/2017/11/22/starks_part_2.html.

[8] Groth, J. On the size of pairing-based non-interactive arguments. In Advances inCryptology – EUROCRYPT 2016 (Berlin, Heidelberg, 2016), M. Fischlin and J.-S.Coron, Eds., Springer Berlin Heidelberg, pp. 305–326.

[9] Hopwood, D., Bowe, S., Hornby, T., and Wilcox, N. Zcash Protocol Spec-ification. https://raw.githubusercontent.com/zcash/zips/master/protocol/

protocol.pdf.

[10] Khaburzaniya, I. AirScript, guildofweavers github. https://github.com/

GuildOfWeavers/AirScript.

[11] Khaburzaniya, I. genSTARK, guildofweavers github. https://github.com/

GuildOfWeavers/genSTARK.

57

[12] Merkle, R. C. A digital signature based on a conventional encryption function. InAdvances in Cryptology — CRYPTO ’87 (Berlin, Heidelberg, 1988), C. Pomerance,Ed., Springer Berlin Heidelberg, pp. 369–378.

[13] Node.js. Download Node.js. https://nodejs.org/en/download/.

[14] Perl, I. Deep Dive into the FRI Protocol. https://medium.com/orbs-network/

deep-dive-into-the-fri-protocol-b830dfc88569.

[15] Schwartz, J. T. Fast Probabilistic Algorithms for Verification of Polynomial Identi-ties. J. ACM 27, 4 (Oct. 1980), 701–717.

[16] StarkWare. Stark Math. https://medium.com/starkware/tagged/stark-math.

[17] Zippel, R. Probabilistic algorithms for sparse polynomials. In Symbolic and AlgebraicComputation (Berlin, Heidelberg, 1979), E. W. Ng, Ed., Springer Berlin Heidelberg,pp. 216–226.

58

Documents

STARK WARS - software.imdea.organais.querol/STARK_WARS.pdf · this document with the great series of blog posts by StarkWare[16] and the entries by Vitalik Buterin [6, 7, 5] for a