9
Cybernetics and Systems Analysis, Voi. 35, No. 4. 1999 CYBERNETICS FAST DIRECT COMPUTATION OF MODULAR REDUCTION A. V. Anisimov UDC 519.8 A new general algorithm is proposed for computation of the multiprecision modular reduction x mod m. This algorithm has better time estimates than the well-known Montgome~ method and much better application characteristics. The algorithm does not require changes in the initial and final values of the arguments. The precomputation time is no more than the time of two multiplications. Keywords: modular reduction, Montgomery method, temporal complexi~ of algorithms, temporal estimates of algorithms, information protection. INTRODUCTION Fast computation of the modular reduction operation x mod m for large x and m is among the central problems in widespread modem systems of information protection, which are based on the methods of public keys such as RSA-based encryption schemes [ 1] or Diffie-Hellman logarithmic exchange [2, 3]. There exist two directions in the methods of modular reduction of large numbers. The general methods of modular reduction use one argument of definite length in a chosen number system as the input. The second direction is related to the search for specific algorithms of modular reduction that take into account a particular form of arithmetical operations or operands such as binary representation of numbers, multiplication of two arguments, exponentiation, and other variants [4-9]. The latter direction is mainly stimulated by attempts at fast computation of modular exponential curves, which is the most laborious operation in the cryptosystems of information protection with public keys. This paper deals with the former direction of general methods. Some well-known and sufficiently effective general methods are used to compute the operation of modular reduction for large numbers, for example, the classical method [10], the Barrett method [ 11], and the Montgomery method [12]. The execution time of all the above-mentioned methods after a definite data preprocessing, if any, is somewhat more than that of one multiplication of large numbers but less than the time of two multiplications of such numbers. Of interest also is the method of computing the modular reduction without multiplication. This method is based on the properties of Fibonacci numbers [13]. The Montgomery method is fastest and most frequently used. The efficiency of this method becomes obvious during iterative execution of modular multiplication of two large numbers, for example, in computing the modular exponentiation. In executing the modular reduction z mod n of the result of multiplication - = x y of two numbers x and y that are less than m for the k-digit module m, the Montgomery method requires one to execute k 2+ k one-digit multiplications. In this case, certain pre- and postcomputations should be performed that are related to the system tuning, and also a transtbrmation of the initial and f'mal values of the arguments should be carry out for each new collection of values of input variables. A draw !,:, .:k of this method is the rather large volume of recomputations of the initial values of the variables and its nonsensitivity to the argument length for small values of the arguments. The operating time of the Montgomery algorithm after execution of all necessary precomputations and argument transformations is equal to k 2 + k, even if the argument x is already a modulo m residue. Taras Shevchenko University, Kiev, Ukraine. Translated from Kibernetika i Sistemnyi Analiz, No. 4, pp. 3-12, July-August, 1999. Original article submitted April 26, 1999. 1060-0396/99/3504-0507522.00 Kluwer Academic/Plenum Publishers 507

Fast direct computation of modular reduction

Embed Size (px)

Citation preview

Page 1: Fast direct computation of modular reduction

Cybernetics and Systems Analysis, Voi. 35, No. 4. 1999

C Y B E R N E T I C S

F A S T D I R E C T C O M P U T A T I O N OF M O D U L A R R E D U C T I O N

A. V. Anisimov UDC 519.8

A new general algorithm is proposed for computation of the multiprecision modular reduction x mod m.

This algorithm has better time estimates than the well-known Montgome~ method and much better application characteristics. The algorithm does not require changes in the initial and final values of the arguments. The precomputation time is no more than the time of two multiplications.

Keywords: modular reduction, Montgomery method, temporal complexi~ of algorithms, temporal estimates of algorithms, information protection.

I N T R O D U C T I O N

Fast computation of the modular reduction operation x mod m for large x and m is among the central problems in

widespread modem systems of information protection, which are based on the methods of public keys such as RSA-based

encryption schemes [ 1 ] or Diffie-Hellman logarithmic exchange [2, 3]. There exist two directions in the methods of modular

reduction of large numbers. The general methods of modular reduction use one argument of definite length in a chosen

number system as the input. The second direction is related to the search for specific algorithms of modular reduction that

take into account a particular form of arithmetical operations or operands such as binary representation of numbers,

multiplication of two arguments, exponentiation, and other variants [4-9]. The latter direction is mainly stimulated by

attempts at fast computation of modular exponential curves, which is the most laborious operation in the cryptosystems of

information protection with public keys. This paper deals with the former direction of general methods.

Some well-known and sufficiently effective general methods are used to compute the operation of modular reduction

for large numbers, for example, the classical method [10], the Barrett method [ 11], and the Montgomery method [12]. The

execution time of all the above-mentioned methods after a definite data preprocessing, if any, is somewhat more than that of

one multiplication of large numbers but less than the time of two multiplications of such numbers. Of interest also is the

method of computing the modular reduction without multiplication. This method is based on the properties of Fibonacci

numbers [13]. The Montgomery method is fastest and most frequently used. The efficiency of this method becomes obvious

during iterative execution of modular multiplication of two large numbers, for example, in computing the modular

exponentiation. In executing the modular reduction z mod n of the result of multiplication - = x �9 y of two numbers x and y

that are less than m for the k-digit module m, the Montgomery method requires one to execute k 2 + k one-digit

multiplications. In this case, certain pre- and postcomputations should be performed that are related to the system tuning, and

also a transtbrmation of the initial and f'mal values of the arguments should be carry out for each new collection of values of

input variables. A draw !,:, .:k of this method is the rather large volume of recomputations of the initial values of the variables

and its nonsensitivity to the argument length for small values of the arguments. The operating time of the Montgomery

algorithm after execution of all necessary precomputations and argument transformations is equal to k 2 + k, even if the

argument x is already a modulo m residue.

Taras Shevchenko University, Kiev, Ukraine. Translated from Kibernet ika i Sistemnyi Analiz, No. 4, pp. 3-12, July-August, 1999. Original article submitted April 26, 1999.

1060-0396/99/3504-0507522.00 �9 Kluwer Academic/Plenum Publishers 507

Page 2: Fast direct computation of modular reduction

In this work, two new general algorithms are proposed whose time estimate is no larger than that of the Montgomery

method, but their application characteristics are much better. The characteristic of temporal complexity of the second

algorithm is substantially no larger than that of the Montgomery method. The algorithms proposed do not require changes in

the initial and final values of the arguments, and the time of precomputations during the tuning of the system is

approximately equal to the time of two multiplications of k-digit numbers. Thus, the algorithms proposed considerably

improve the well-known general methods of accelerated modular reduction.

MONTGOMERY ALGORITHM

Since the algorithm being proposed uses computing expedients of the Montgomery method, for the sake of

completeness of the presentation, we will outline the basics of the Montgomery method.

Let b be an integer positive number that is assumed to be the radix of a number system, and m and x be integer

positive numbers that are given in a number system with the radix b

k-1 m= Z mi b~ 0 < m k _ 1 <b , 0 < m i < b

i =0 for i =0 ,1 ,2 . . . . . k - 2 ;

1-1 x= Z Xi hi , 0 < x / _ 1 <b , O<x i _ <b,

i=0 for i = 0 , 1 , 2 . . . . . l - 2 .

The length of the digital representation of the number x is equal to l. The length of the number x will be also denoted

by Ilxil. If Ilxll= 1, i.e., the inequality 0 < x < b is fulfilled, then x is called a figure.

Our task is to compute x mod m, i.e, the remainder of division of the number x by m.

Let R be some integer positive number such that GCD (R, m) = 1, and let R -I be the number inverse to R modulo m,

i.e., the following relations are fulfilled: RR -1 = l m o d m and 0 < R - 1 <m.

The Montgomery method is based on the following, quite obvious, fact.

Montgomery Theorem [ 12]. For any integer positive number x, there exists an integer positive number t such that the following relations are fulfilled:

(i) 0 < t < R; tm +x

(ii) is an integer; R

(iii) tm + x =R_lxmodm" R

If in the conditions of the Montgomery theorem R is chosen sufficiently large, namely, such that the inequality x < Rm

is fulfilled, then the inequality tm + x < 2m is also fulfilled. Hence, the computation of R- ix mod m may be reduced to the R

computation by the formula R-Zx mod m - tm +x

A, where A =0 or A =m.

To simplify the computing problems connected with modular operations, multiplications, and divisions, the number R

may be conveniently chosen equal to the power of the number b. The Montgomery method is commonly used tbr multiple

computations of series of modular multiplications in iterative loops. In this case, the argument x is the product of two k-digit

numbers each of which is a modulo m residue. In such a situation, b ~ may be assumed to be the number R, and the argument

x is no more than 2k in length. Let m ' = m- lmodR. It may be shown that t = m'xmodR. If the number t is computed by this

formula, then the upper bound of the complexity of the operation of modular reduction will not exceed the complexity of two

multiplications of k-digit numbers. P. Montgomery used an interesting computing expedient that makes it possible to

substantially decrease the number of the multiplication operations required.

Let the number t described in the Montgomery theorem be of the following form:

t = t o +qb+. . .+ tk ._ lb k - l , O<ti<b_ , i = 0 , 1 , . . . , k - 1 .

508

Page 3: Fast direct computation of modular reduction

Let m 01 -1 - = m modb . The basic computing innovation proposed by Montgomery is that we compute not the entire

number t, but sequentially each individual digit t i, i =0, 1 . . . . . k - 1, and add the products tintb ~ to the current sum stored in

x. This permits one to use only the low-order digit ml) of the number m during the computation of each digit t i. In more

detail, the number t 0 is found from the relation tom + x - 0 m o d b , i.e., tl) m0 1 = - x modb.

Then, the sum x = ~ x + t o m is taken as x. Analogously, t 1 is found from the relation tim +x = 0 m o d b , i.e., b

-1 t 1 = - m 0 x mod b. Again, the current result is modified according to the formula x =

x + t 1 ol and so on. Such a computation

-lmod b. of coefficients is well known in the theory of numbers as the method of inversion of p-adic numbers. Let c 0 = m0

Function REDC(x) / * Montgomery algorithm */

for ( i = 0 ; i < k ; i + + ) do {

t i =(x , c 0 ) m o d b;

x = x +t im;

x = x div b;

i f x > m then x = x - m ; ]

The function REDC(x) quickly computes R - l x m o d m . To quickly execute series of multiplications, certain

transformations of arguments in loops should be carried out beforehand. Let M ( x ) = R x m o d m ; M will be called the

Montgomery mapping. For the Montgomery mapping, the following relations are fulfilled [12]"

M ( x + y) = M(x ) +_ M(y) (mod m);

M(,W) = REDC(M(x) * M(y) ) .

Therefore, if the initial value M(x) of the arguments is used instead of x in the body of a block computed, and the

operation of multiplication x , y in the block body is replaced by R E D C ( x , y), then this block will produce the resulting

value of the variable x equal to M(x). The value of x can be obtained from M(x) by the formula x = REDC(M(x)).

-1 -1 mod b, The Montgomery method requires the following additional actions: precomputation of m 0 = m

transformations of the arguments x = M ( x ) , and also final transformations of the arguments R - l M ( x ) m o d m =

-REDC(M(x)) . Thus, the application of the Montgomery algorithm requires substantial additional computations that are

mainly related to transformations of the initial values of the arguments.

We assume that a complexity time unit is the time of one radix b multiplication of two digits. The addition time and

also the time of multiplication and division by the radix b are neglected. Let us estimate the execution time of the

Montgomery algorithm.

One digital multiplication is required to compute each digit t i in the number system with the radix b. It is required to

execute k digital multiplications in order to multiply the module m by a digit. Thus, the time of computating R - I x mod m

according to the Montgomery method is equal to k2 + k, irrespective of the length of the argument x.

In the worse case, the upper bound k(k + 1) of the execution of the algorithm is best by now. To compare, we note

that, for the arguments of length 2k, the estimate of the comput~ition of x mod m by the classical method is equal to k(k + 2.5),

and by the Barrett method is equal to k(k + 4) [14].

AN A L T E R N A T I V E M E T H O D

The Montgomery method stated above is effective only in the computation of iterative constructions with multiple

repetition of the modular reduction. This method is genarally applied to the evaluation of the function of modular

exponentiation x d mod m.

The natural question arises as to whether we may propose an alternative general algorithm whose time estimates are

509

Page 4: Fast direct computation of modular reduction

no worse than that for the Montgomery method and. at the same time, whose application conditions are more convenient.

The positive solution of this problem is given below. We propose an algorithm of modular reduction that has the following

advantages:

(i) the algorithm does not require the initial and final transformations of the arguments;

(ii) the time of the precomputations for the 2k-digit argument is no ~ea te r than the time of two multiplications of

k-digit numbers;

(iii) the execution time of the algorithm for a series of modular multiplications does not exceed at worst the time of

the Montgomery reduction;

(iv) the time estimate is proportionally dependent on the argument length;

(v) the algorithm does not require postcomputations.

As is easily seen, the algorithm proposed obviates all the inconveniences of the Montgomery method. It is described

below by means of stages of step-by-step improvements.

Let x = x 0 + x l b + . . . + X 2 k _ l b 2 k - l . We denote by x (i) thepart of the number x that begins from the lower-order

positions of the length i, x (i) = x 0 + x l b + . . . + x i b i , 0 _< i_< 2 k - 1.

Stage 0. The algorithm proposed sequentially computes the residues u2k_ 1 = X 2 k _ l b 2 k - l m o d m , U2k_2 =

x 2 k _ 2 b 2 k - 2 m o d m . . . . u k _ 1 = x i ~ _ i b k - 1 modm. The residues computed are added up. The final result is corrected, if

necessary, by the subtraction of the module m.

A general scheme of the algorithm proposed is as follows.

Algorithm A0

u = 0 ; x = X i n p u t" V = x ( k - 2 ) "

for ( i = 2 k - 1 ; k - 2 < i < 2 k ; i - -

u i = Xi b i mod m;

U . - - U +'U i"

i f u > m then u = u - m

U - ' U -t- V "

i f u > m then u = u - m ;

Xoutput = u .

) d o {

The basic problem is to quickly compute products of the form x i b i m o d m and their sums. To this end, we will carry

out some general theoretical treatment.

T H E O R E T I C A L FOUNDATIONS

Let R > m, GCD (R, m ) - 1, and R be a number that is convenient as a multiplier and divider. The Montgomery

algorithm is based on the well-known fact of linear representation of a unity by the numbers R and M , R R - : _ m m " = 1.

For the algorithm proposed, we use another relation, namely, R = q m + r, where q and r are some positive integer

numbers, and, for a residue r, the following inequalities are fulfilled" 0 < r < m. Let x and j be integers, x ~: 0. It is obvious

that the following ider~rities are valid:

R x = q x m + rx;

R ( x - j m ) = ( q x - j R ) m + r x .

If we choose a j such that t = q x - j R = q x m o d R , then the following relation holds true:

('m+rx / x - rood m.

R

(i)

510

Page 5: Fast direct computation of modular reduction

If R is chosen sufficiently large, namely, such that the inequality x <

inequality also holds true"

R m

F

holds true, then as is obvious, the followin,,

tm + rr (2) 0 < ~ < 2 m .

In this case, x mod m is computed by the followin,,= simple formula:

tm + rv (3) x - - A � 9 where A = 0 or A = m .

Since r < m, in order that inequali ty (2) hold true�9 it suffices, in particular, that the inequali ty x < R hold true.

P R E C O M P U T A T I O N S

i To compute .rib mod m in the algorithm, it is necessary to know in advance the values 1, = b i mod m. Therefore�9 the

preliminary computat ion in the method consists of the series of computations r2k = b 2k rood m, r2k_ 1 = b 2/~-1 mod m,

r2k_ 2 = b 2/,--2 rood m . . . . . r~: = b/~ mod m. If�9 tbr example�9 the classical method is used for each computation in this series,

then the total execution time of such a series will be approximately equal to the t ime of k 3 one-digit multiplications.

However, this series of computat ions can be easily executed recurrently, starting from the value r2/.. Thus, the start

computat ion consists of the determinat ion of r2k = b 2k mod m. The passage from r i + 1 to r i when k - 1 < i < 2k is carried out

by formulas (1) and (3), and selecting the values of R = b i+1 and the argument x = b i.

If bi + 1 = qi + I m + r i + 1�9 then, applying (1)�9 we obtain

b i ( q i + l bi m o d b i + l ) r n + r i+l b i - m o d m.

b i + l

Since q i b i m o d b i + l = b i s i , w h e r e s i is some digi t , O < _ s i < b , r e d u c i n g the f rac t ion by b i y i e l d s

b i _- si m + ri+ 1 mod m. b

The digit s i is found from the relation s i m +/'/+ 1 - 0 m o d b � 9 i.e., s i = ( - m ~ T l l ' / + l ) m o d b . Thus, the time of the

computat ion of r i, provided that i"i + 1 is given, is equal to the time of the radix b mult ipl icat ion of two digi ts , i.e., is equal to

unity.

Thus, r i = sire + ri+l if s i m + ri+l < m, and ri = sire + ri+l - m if sire + ri+l > m. The time complexity of the b b b b

computat ion of the sequence r2k_ 1 . . . . . rk_l �9 provided that the start number r2k --b 2k mod m is given�9 is equal to k 2.

The start computation of r2~: = b 2 k m o d m may be fulfilled, for example�9 by the classical method.

A L G O R I T H M S

Stage 1. Let us assume that an array of numbers r2k . . . . . rk._ 1 is determined. We use the relation b i+I = q i + l m + r i + l to

compute it i = x i bi modm. Applying formulas (1) and (3) when R = b i+1 and x = x i h i � 9 we obtain

(qi + l b i X i m ~ + r i + l b i x i t im + r i + l x i ui --- mod m = - A,

b i+l b

511

Page 6: Fast direct computation of modular reduction

-1 where A = 0 or A = m , O < t i < b . The number t i is found from the relation t i m + r i + l X = O m o d b , t i = ( - m r ri+l)X i

mod b = si Xi mod m.

Thus, having formed the arrays r 2 k , r 2 k _ 1 . . . . . rk_ 1 and S 2 k _ 1 . . . . . S k _ l , i = 2 k - 1 . . . . . k - l , in advance, we can

compute u i = x i bz modm according to the following scheme:

for ( i = 2 k - l ' k - 2 < i < 2 k ; i - - ) do {

t i - s i x i modb;

t i m + r i + lXi u i =

b

if u i > m then U i = u i - - m ;

}

The direct computation of each u i requires 1 + k + IIr i +ill one-digit multiplications. The computation o f x m o d m may

be performed according to the scheme given below.

A l g o r i t h m A1

P r e c o m p u t a t i o n : r 2 k , r 2 k - 1 . . . . . rk" s 2 k - I , S 2 k - 2 . . . . . s k - 1 �9

u : = 0 ; x=Xinput" V = X (k -2 )"

for ( i = 2 k - l ' k - 2 < i < 2 k ; i - - ) do {

t i = s i x i modb;

y = ( t i m + r i + l X i ) d i v b ;

if y > m then y = y - m ; u = u + y;

i f u > m then u = u - m } tl -- bl -t- U',

if tt > m then u = u - m;

x output = u.

It is obvious that the Algorithm A1 requires the execution of (1 + k ) k +ll r2kl l+ l l r 2 k _ l l l + . . . + l l r k l l one-digit

multiplications. Since the length of the numbers r i may be about k, the time estimate of the Algorithm A 1 is at worst equal to

2k 2, i.e., about two operations of multiplication of k-digit,, numbers. The Algorithm A1 has worse time characteristics in

comparison with the Montgomery algorithm.

t i m + ri + lXi , S t a g e 2. The above-mentioned algorithm may be improved. Since the numbers of the form are summed

b

during the computation of x mod m, and the intermediate numbers divisible by m may be discarded, we propose to exclude all

the multiplications of t i by m and to perform beforehand the modulo b addition of all t i and multiply the resulting remainder

by m only once. Such an innovation reduces the number of one-digit operations of multiplication of the number m by an

order of magnitude.

A l g o r i t h m A 2

P r e c o m p u t a t i o n : r2k , r2k -1 . . . . . r k �9 s 2k - 1 , s 2k - 2 . . . . , s k - 1 ,

u: = 0; x = Xinput" Y = 0; V = X (k - 2)

for ( i = 2 k - l " k - 2 " < i < 2 k ; i - - ) do {

t i = s i x i modb;

y = ( y + t i ) m o d b : /* y = (t2/~-l + t2k_ 2 + . . . + t i ) m o d b */

tl = u + r i + l Y i" /*zt = ( r 2 k X 2 k _ 1 + r 2 k _ l X 2 k _ 2 + . . . + r i + l x i ) m o d b m */

i f u > b m then u = u - b m

} y = y m ;

512

Page 7: Fast direct computation of modular reduction

u =(u + v)divb;

if u > m then u = u - m :

ll : ll q- V;

if tt > m then u = u - m:

x output = u.

The execution of the Algorithm A2 requires k + l digit multiplication operations to determine the numbers t i, k

multiplication operations to multiply a digit by the number m, y = ym , and I Ir2kll + ... + I lrkll one-digit multiplication

operations to compute the products r i +l x i . Since Ilrill< k, the complexity evaluation of the Algorithm A2 does not exceed

k" " + 3 k + l .

The time estimate of the Algorithm A2 is already good enough.

Stage 3. The values t i are computed every time in the Algorithm A2. We may avoid this and compute directly the

resulting value t = ( t 2 k _ 1 + t 2 k _ 2 + . . . + t k _ l ) m o d b . The number t must satisfy the relation tm + u - 0 m o d m , where

u =(r2kX2k_ 1 + r2k_lX2k_ 2 + . . . + r k X k _ l ) m o d b m .

Algorithm A3 P r e c o m p u t a t i o n : r2k , r2k -1 . . . . . rk ,

u:=0;x=Xinput" Y =0; V=x (k-2)

for ( i = 2 k - 1 ; k - 2 < i < 2 k ; i - - ) do

i f x i r then {

u =tt + r i+lXi; /*u = ( r 2 k X 2 k _ 1 + r2k_ lX2k_ 2 + . . . +ri+lX i) m o d b m * /

if u > b m then u = u - b m

}

t = ( - m 0 lu) mod b;

v =tm;

u = (u ~- y)div b;

i f u > m then u = u - m ;

i t = i t + u ; ,

if It > m then u = u - m;

-'r output = u.

The execution of the Algorithm A3 requires I Ir2~ II + . . . + i lrt Ii one-digit multiplication operations to compute the

products r i + l x i , one multiplication operation to determine the number t, and k one-digit multiplication operations to compute

the number y tm. Hence, the total time of one-digit multiplication operations does not exceed k2 = + 2k + 1. In fact, this

alreadyapproximates the time estimate of the Montgomery algorithm. This estimate may also be improved, if necessary.

Stage 4. To further reduce the time of execution of the Algorithm A3, we will decrease by one the range of the change

in the loop index i of the Algorithm A3, i.e., the range of the change in the index i is equal to k - 1 < i < 2k. In this case, the

remainder v of the number x to which the modular reduction is not applied is equal not to x ( k - 2) as in A3 but to x (k -1),

v = x (/~-l). The objective of the modification of the Algorithm A3 is to obtain a k-digit number Xoutput such that

Xoutput = -'r mod m. In this case, there is no guarantee that x is a modulo m residue. It is only required that the value x output be

comparable with the module m by an order of magnitude..

A variant is possible where v > m and in the final addition in the Algorithm A3, after execution of the following

sequence of instructions: u = u + tr, if u > m, then u = u - m; and Xoutput = tt, a number can be obtained that has an order

greater than m, i.e., the resulting value u can have an order equal to k even after the final subtraction of m. In this case, it is

necessary to provide a reduction in the order of the resulting value Xoutput. To this end, it suffices to take away b t from the

resulting (last) value of u and to add r~:. We guarantee that after the execution of such a procedure, Xoutput has an order that

does not exceed k - 1 . In tact, u = u ' + v - m , where u' is a modulo m residue, 0 < u ' < m. Next, if u > b k, then

u - b t + r t - u m o d m and i t - b k + r k = u ' + v - m - b ~ + r t = r • - ( b k - v ) - ( m - u ' ) < m .

Therefore, the Algorithm A4 is of the form given below.

513

Page 8: Fast direct computation of modular reduction

Algorithm A4

P r e c o m p u t a t i o n " r2k , r 2 k - 1 . . . . . rk + 1.

u :=0 ; x =Xinpu t" y = 0 ; /3=-r

for ( i = 2 k - 1 ; k - l < i < 2 k ; i - - ) do

if x i :/:0 then {

u = u + r i + l X i"

if u > b m then u = u - b m )

m

t = ( - m 0 lu)modb;

y = t m ;

u = (u + y)div b;

i f u > m then u = u - m ;

tl = t t + U;

if u > m then u = tt - m;

if u > b k then u = u - b k -i- r k �9 x output = u.

Let us estimate the complexity of the Algorithm A4. One iteration has been saved in this algorithm in comparison

with the Algorithm A3. Therefore, the number of digit multiplication operations in the Algorithm A4 is equal to

IIr2k II +llr2k-111 + ... +lit k + ill multiplications to compute the corresponding values of r i + lXi , k < i < 2k, one multiplication

operation to determine the number t, and k multiplications to compute ym. Since Ilrill< k, the total number of multiplications

does not exceed k 2 + k + 1.

We note that the estimate of the Montgomery reduction REDC is always equal to k 2 + k. In the variant proposed, the

algorithm A4 always requires the number of one-digital multiplications to be no greater than k 2 + k. Moreover, the estimate

!1 rill = k is rather overstated for all i.

The algorithm A4 is intended for the case where the loops having the body of the form {.., - ' = xymod m... } are

necessary. In this case, a k-digit value z within the loop body is supported that is modulo m comparable with xy. After the

loop termination, it suffices to produce a simple additional operation. If Xoutput is a k-digit number, Xoutput = x 0 + x l b + . . .

+ Xt_ l b ~ - l , then x =x (t-2) + Xk_l bk -1 . If x t _lb ~-l > m, then the simple reduction is applied to the expression x t _ l b ~:-1

t k - 1 m + r k X k - 1 according to the formula u k _ 1 = x ~ _ l b ~ - l ( m o d m ) = - A , where A =0 or A =m. As a result; k possible

b

one-digit multiplication operations transferred to postcomputations will be added.

The resulting value x output is easily computed by the following commands:

tt = U k -1 + x(k-2)"

if u > m then u = u - m ;

Xoutput : u.

Thus, the application of the algorithms A3 and A4 gives time estimates that are no worse than that of the Montgomery

function; it does not require pre- and postchanges in the arguments, and the precomputations are comparable with those for

the Montgomery method. Moreover, the Algorithms A3 and A4 are easily used and can obviously be parallelized.

CONCLUSION

The estimate of the number of one-digit multiplication operations for all the well-known general algorithms of

modular reduction is of the form k ( k + c), where c > 0.

Assumption. There exists a general sequential algorithm of modular reduction having the estimation k(k - c ) , where

c > 0 .

514

Page 9: Fast direct computation of modular reduction

REFERENCES

10. 11.

12. 13. 14.

1. R.L. Rivest, A. Shamir, and L. Alderman, "A method ~br obtaining digital signatures and public-key cryptosystems," Commun. ACM, 21, 120-126 (1978).

2. W. Diffie and M. E. Hellman,"New directions in cryptography," IEEE Tranc. Inform. Theory, IT-22, No. 6, 644-654 (1976).

3. T.E. Gamal, "A public-key cryptosystems and a signature scheme based on discrete logarithms," IEEE Trans. Inform. Theory, IT-31, No. 4, 469-472 (1985).

4. S. Kawamura, K. Takabayashi, and A. Shimbo, "A fast modular exponentiation algorithm," IEICE Trans., E-74, No. 8, 2136-2142 (1991).

5. H. Morita and C. Yang, "A modular multiplication algorithm using look-ahead determination," IEICE Trans, E-76-A, No. 1, 70-77 (1993).

6. S.R. Dusse and B. S. Kaliski, "A Cryptographic library for the Motorola DSP56000," in: Advances in Cryptology: Eurocrypt 90, Lecture Notes Comput. Sci., No. 473, 230-244 (1991).

7. S.M. Hong, S. Y. Oh, and H. Yoon, "'New modular multiplication algorithm for fast modular exponentiation," Advances in Cryptology: Eurocrypt 96, Lecture Notes Comput. Sci., No. 1070, 166-177 (1996).

8. A.V. Anisimov, "Linear Fibonacci forms and parallel algorithms for high dimension arithmetic," Lecture Notes Comput. Sci., No. 964, 16-20 (1995).

9. Che Wun Chiou, "Parallel implementation of the RSA public-key cryptosystem," Intern. J. Comput. Math., No. 48, 153-155 (1993). D. E. Knuth, The Art of Computer Programming [Russian translation], Vol. 2, Mir, Moscow (1977). P. D. Barrett, "Implementing the Rivest Shamir and Adleman public-key encryption algorithm on a standard digital signal processor," Advances in Cryptology: Eurocrypt 86, Lecture Notes Comput. Sci., No. 263, 311-323 (1987). P. L. Montgomery, "Modular multiplication without trial division," Math. Comput., 44, No. 170, 519-521 (1985). R. Floyd and D. E. Knuth, "Addition machines," SIAM J. Comput., 19, No. 2, 329-340 (1990). A. Bosselaers, R. Govaerts, and J. Vandawalle, "Comparison of three modular reduction functions," Advances in Cryptology: Eurocrypt 94, Lecture Notes Comput. Sci., No. 773, 175-186 (1994).

515