View
2
Download
0
Category
Preview:
Citation preview
CHAPTER 5
REVERSIBLE CRYPTOGRAPHIC
HARDWARE
The power analysis attack on cryptographic hardware can be defended by using
the reversible logic as it ideally does not dissipate any heat. Naturally di�erent
designs for reversible crypto-processor have been proposed in recent past. These
designs have been implemented using complex gate libraries. In this chapter,
we have proposed novel designs for reversible Arithmetic Logic Unit (ALU) of
a crypto processor using a standard gate library. The quantum costs reported
in this chapter are lower than the lower bounds reported in [18]. Further, we
have calculated delay and transistor cost of the proposed designs. We have used
RevKit to verify that the circuit cost of the proposed designs are minimal. This
is for the �rst time that the optimization algorithms are used to optimize the
quantum cost and delay of reversible ALU. The work reported in this chapter
are published in [103].
5.1 Introduction
The widespread use of internet has lead to question the fundamental security
requirements which are con�dentiality, authentication, data integrity and non-
repudiation. To cope up with the security, most of the networked services use
public key cryptography. There are various public key cryptographic protocols
78
CHAPTER 5. REVERSIBLE CRYPTOGRAPHIC HARDWARE
such as the RSA (stands for Rivest, Shamir and Adleman) cryptosystem, Di�e-
Hellman key exchange, Digital Signature Standard and more recent Elliptic
Curve Cryptography (ECC). The core arithmetic operation in these public key
cryptographic protocols is modular multiplication, which is used to calculate
modular exponentiation. A faster implementation of modular multiplication is
based on Montgomery's modular multiplication algorithm [124]. Moreover the
cryptographic algorithms are secured against mathematical attacks but the at-
tackers can break the encryption by measuring the energy consumption in the
ALU and the microchips there in. ALU of crypto-processor consists CSA, mul-
tiplier, register, shift register, multiplexer and accumulator. Nayeem et al. [18]
and Thapliyal and Zwolinsky [96] have proposed reversible crypto-processor us-
ing Montgomery's multiplier. In this chapter we have provided circuits with
lower quantum cost than the lower bounds reported in [18]. We have reported
circuit cost using NCT gate library and have also reported delay and transistor
cost (TrC) of our circuit designs. In Section 5.2 we have presented our work
and �nally we conclude in Section 5.3.
5.2 Reversible hardware cryptography
The ALU of a crypto-processor leak information through power consumption
and therefore it is the major source of power consumption in the hardware
cryptography. An ALU comprises of Carry Propagate Adder (CPA), CSA,
multiplier, register and multiplexer. In literature, several designs of Mont-
gomery's multiplier have been proposed (Zhang [125] and references there in).
Recently Nayeem et al. [18] have proposed reversible CSA architecture imple-
mentation of Montgomery multiplier and reported lower bounds on the circuit
cost (using a complex gate library), quantum cost and garbage bits. There are
several techniques for the long carry propagation during the addition stages of
the computation such as systolic array and CSA. Nayeem et al. [18] have fo-
cussed on the CSA implementation of Montgomery multiplier. They proposed
79
CHAPTER 5. REVERSIBLE CRYPTOGRAPHIC HARDWARE
reversible design of Montgomery multiplier and provided following Theorems
and Lemmas de�ning lower bounds on gate count (circuit cost) and quantum
cost:
Theorem 1: An n-bit reversible register can be realized by at least 2n gates
and n+1 garbage outputs.
Lemma 2: The quantum cost of an n-bit reversible register is at least 6n.
Theorem 3: An n-bit reversible SISO shift register using master-salve D �ip-
�ops can be realized by at least 6n-1 gates and 2n+1 garbage outputs.
Lemma 4: The quantum cost of an n-bit reversible SISO shift register using
master-slave D �ip-�ops is at least 12n.
Theorem 5: The characteristic function of Q+i of reversible PIPO shift register
(using clocked D �ip�op) can be obtained by 2 gates and 4 garbage outputs
with 10 quantum cost.
Theorem 6: The n-bit reversible PIPO shift register using clocked D �ip-�ops
can be implemented by 5n reversible gates and 3n+3 garbage outputs.
Lemma 7: The quantum cost of an n-bit reversible PIPO shift register using
D �ip-�ops is at least 18n.
Theorem 8: The proposed n-bit reversible Montgomery multiplier can be
realized by at least 22n+25 gates and 16n+28 garbage outputs with quantum
cost of 80n+91.
Here we would like to note that in the above stated theorems [18], they
claimed minimality by claiming that the gate count and quantum cost obtained
by them is least. The claim is not appropriate because no algorithm is used
to generate minimal circuits (reversible or quantum). We have established this
point by synthesizing circuits having lower quantum costs than the reported
bounds [18]. But even our Algorithm 2.1 does not ensure the minimality of the
quantum cost. Consequently the use of word "atleast" should be avoided in all
the Theorems reported in [18] and the costs mentioned here should be modi-
80
CHAPTER 5. REVERSIBLE CRYPTOGRAPHIC HARDWARE
Figure 5.1: Reversible four-to-two CSA adder.
Figure 5.2: Full adder
�ed with the newly found costs. We have done so and reported the Modi�ed
Theorems and the Modi�ed Lemmas in the present work. At the end of this
section we have listed all the Modi�ed Theorems and Modi�ed Lemmas. Before
we describe modi�ed Theorems and Lemmas we need to provide a new design
for reversible ALU. The design process of our circuits may be described in the
following steps:
1. Obtain the reversible truth table. Ensure that the reversible truth table
contain minimum garbage bits.
2. Obtain a minimal reversible (NCT) circuit using exact synthesis algorithm
[16] in RevKit [17].
3. Obtain the TrC of the circuit.
4. Obtain an elementary quantum circuit by converting the To�oli gate (CC-
not) into elementary gate.
5. Apply the quantum cost optimization algorithm (Algorithm 2.1) to obtain
quantum cost.
6. Apply the level compaction algorithm [20] to obtain delay.
81
CHAPTER 5. REVERSIBLE CRYPTOGRAPHIC HARDWARE
Figure 5.3: Elementary circuit of Full adder.
We have used the above procedure to provide the optimal designs of di�erent
components of ALU. To be precise we have provided optimal designs of CSA,
gated D latch, register, shift register, multiplexer and Montgomery multiplier.
In Figure 5.1 we have presented a reversible four-to-two carry save adder. This
design is obtained by combining two reversible Full Adder circuits shown in
Figure 5.2. Proposed four-to-two carry save adder comprises of 4 CCnot gates
and 4 Cnot gates. Thus it's gate count is 8. The TrC of CCnot gate is 16
and that of Cnot gate is 8. Therefore, the TrC is of reversible four-to-two
carry save adder 4 × 16 + 4 × 8 = 96. The equivalent elementary circuit of
Full Adder is shown in Figure 5.3 and it's quantum cost is 6 and delay is 5.
The proposed design in Figure 5.1 has a delay of 10. Therefore, the quantum
cost of four-to-two CSA design is 12 and delay is 20. The reversible gate TSG
and MTSG have been used for adder operation by Thapliyal and Zwolinsky
in [96] and by Nayeem et al. in [18] respectively. Thus the gate count is
reported as 1. However it does not provide any advantage and we have proved
in Section 3.3 that several such gates can be proposed. Further, the TSG and
MTSG belong to complex gate library and neither they have been realized
experimentally nor there exist any experimental proposal for their realization.
On the other hand, the proposed NCT circuit can be realized experimentally.
We will brie�y discuss the existing reversible four-to-two CSA design. The four-
to-two CSA design proposed by Thapliyal and Zwolinski [96] requires two TSG
gate which has quantum cost of 13 each and in Nayeem et al. [18] the four-
to-two reversible CSA architecture requires two MTSG. We have decomposed
the MTSG gate into a minimal NCT circuit by using RevKit and by applying
Algorithm 2.1 obtained it's quantum cost as 6. In Table 5.1 we have compared
82
CHAPTER 5. REVERSIBLE CRYPTOGRAPHIC HARDWARE
[18] [96] ProposedQuantum cost 12 26 12Garbage bits 4 4 4
Gates MTSG TSG NCT
Table 5.1: Comparison of proposed reversible four-to-two CSA adder with Nayeem et al. [18]and Thapliyal and Zwolinsky [96].
Figure 5.4: Reversible D latch (a) NCT circuit (b) quantum circuit.
the quantum cost and number of garbage bits of the proposed reversible four-
to-two CSA adder with the designs for the same proposed by Nayeem et al. [18]
and Thapliyal and Zwolinski [96].
Apart from CSA this multiplier requires sequential elements like register and
shift registers. In Figure 5.4, we have presented a novel gated D Latch with one
output. It requires a CCnot gate and two Cnot gates. Thus it's circuit cost is
3, TrC is 32, quantum cost is 5 and delay is 5 whereas the existing �ip �op and
registers in [18,96] comprises of gated D latch which use a Fredkin gate.
The n-bit reversible register is designed from n gated D latch as shown in Fig-
ure 5.5. Each gated D latch contains three gates and two garbage outputs.
Figure 5.5: Proposed reversible n-bit register.
83
CHAPTER 5. REVERSIBLE CRYPTOGRAPHIC HARDWARE
[18]. [96] ProposedQuantum cost 6n 6n 5nGarbage bits 2n n+1 n+1
Gates F and C F and C N, C and T
Table 5.2: Comparison of proposed reversible n-bit register with Nayeem et al. [18] andThapliyal and Zwolinski [96].
In Theorem 1 of [18] Fredkin gate is counted as 1. This is correct if we use
Fredkin gate library. If we consider other gate library then it is not applicable
but the Lemma 2 of [18] should stand the same for all reversible gate libraries
as it provides a lower bound on the the quantum cost. Here it is observed that
present design has lesser quantum cost than the lower bound in Lemma 2 [18].
Modi�ed Theorem 1: In NCT gate library, 3n gates and n+1 garbage out-
puts are su�cient to realize an n-bit reversible register.
Each gated D latch contains one CCnot gate and one Cnot gate and another
Cnot gate for Copy (Copy gate). The CCnot gate and Cnot gate has quantum
cost of 4 as shown in Figure 5.4 therefore total quantum cost of a reversible
register is 4 +1=5 and n-bit reversible register will have quantum cost of 5n.
In Table 5.2 we have compared the proposed design with [18]. The delay for
the reversible n-bit register is 5n and TrC is 32n.
Modi�ed Lemma 2: 5n elementary quantum gates are su�cient to realize
an n-bit reversible register.
In a Serial-In, Serial-Out register (SISO) data is shifted one bit right with every
clock pulse. In Figure 5.6 we have presented n-bit reversible SISO shift register
comprising of D �ip �op (master slave). A reversible master slave D �ip �op
with one output requires 2 gated D latch with one Not gate to invert the Clock
signal before it reaches slave gated D latch and another Not gate to retrieve the
clock signal. Thus it requires 8 gates including Copy gates. Therefore, an n-bit
reversible SISO shift register will require n reversible master-slave D �ip-�ops
and 8n gates. But the last Not gate can be avoided because it is a garbage
gate thus the total number of gates is 8n-1 and the number of garbage bits is
84
CHAPTER 5. REVERSIBLE CRYPTOGRAPHIC HARDWARE
Figure 5.6: Proposed reversible n-bit SISO register.
2n+1. In Theorem 3 of [18], as we have already mentioned, Fredkin gate is
used for gated D latch and the circuit cost is considered as 1, but their design
can be improvised by considering a single bit line for clock and then another
bit line for inverted clock pulse as shown in Fig 5.6. Therefore, Theorem 3
in [18] would require 4n+1 gates (2n Fredkin gates + 2n Cnot gates + 1 Cnot
to invert the clock pulse) and 2n+2 garbage bits. The TrC of n-bit reversible
SISO shift register is 64n. The quantum cost of D �ip �op is 8. Here we would
like to note that in [18] the quantum cost of Not gate is mentioned as 0 but it
is not so, quantum cost of Not gate or any one qubit gate is 1. The quantum
cost of n-bit SISO shift register proposed here is 10n while the quantum cost
reported in [18] is 12n. In Table 5.3 we have compared the proposed design
with Nayeem et al. [18].
Modi�ed Theorem 3: In NCT gate library, 8n-1 gates and 2n+1 garbage
outputs are su�cient to realize an n-bit reversible SISO shift register using
master-salve D �ip-�ops.
Modi�ed Lemma 4: 10n elementary quantum gates are su�cient to realize
an n-bit reversible SISO shift register.
A Parallel-In, Parallel-Out register (PIPO) is one where the data input for
example (I1, I2...., In) are loaded into the register with the next clock pulse, it
85
CHAPTER 5. REVERSIBLE CRYPTOGRAPHIC HARDWARE
[18] [96] ProposedQuantum cost 12n 13n+1 10nGarbage bits 2n+1 4n+1 2n+1
Gates F and C F and C N, C and T
Table 5.3: Comparison of proposed reversible n-bit SISO register with Nayeem et al. [18]and Thapliyal and Zwolinski [96].
Hold Enable Final output Q+i
0 0 Qi−1(Right shift)0 1 Ii(Parallel load)1 Don't care Qi(No change)
Table 5.4: Truth table for PIPO shift register
shifts in parallel and it appears at the parallel output (O1, O2...., On) at the
end. In Figure 5.8 we have presented the PIPO shift register. It consists of a D
�ip �op that executes the output depending on the value of Hold and Enable
bit lines. When the Hold bit line is low (0) and the Enable bit line is high (1)
then the input value I1 is loaded into the register and if Enable is low then
input is shifted right. When the Hold bit line is high then the register retains
it's present value.
The characteristic function of Q+i can be obtained from Table 5.4 as
Q+i = HOLD.E.Ii+HOLD.E.Qi−1 +HOLD.Qi.
For the �rst stage Qi−1 is the serial input (SI) and for the last stage Qi is
the serial output (SO). Nayeem et al. [18] have realized this equation by two
Fredkin gates. In Figure 5.7 we have realized the characteristic equation of
PIPO shift register in NCT circuit.
Modi�ed Theorem 5: In NCT gate library, the characteristic function of Q+i
of reversible PIPO register can be realized by 4 gates and 4 garbage bits.
Figure 5.7: Implementation of characteristic equation of PIPO shift register.
86
CHAPTER 5. REVERSIBLE CRYPTOGRAPHIC HARDWARE
Figure 5.8: Proposed reversible PIPO register (a) basic cell (b) NCT circuit.
[18] ProposedQuantum cost 18n 15nGarbage bits 3n+3 3n+3
Gates F and HNFG N, C and T
Table 5.5: Comparison of proposed reversible n-bit PIPO register with Nayeem et al. [18].
The basic cell of PIPO shift register consists of generation of Qi and D �ip �op.
Therefore, it will comprise of 3 CCnot gates and 6 Cnot gates. Thus the TrC
of n-bit reversible PIPO shift register is 96n.
Modi�ed Theorem 6: In NCT gate library, 9n gates and 3n+3 garbage
outputs are su�cient to realize an n-bit reversible PIPO shift register using
clocked D �ip-�ops.
The reversible PIPO register shown in Figure 5.8 contains 9 gates with 3 CCnot
gates and 6 Cnot gates out of which 3 Cnot gates are Copy gates.
Modi�ed Lemma 7: 15n elementary quantum gates are su�cient for imple-
mentation of an n-bit reversible PIPO shift register.
The present design improves over the Lemma 7 in [18] because the quantum
cost of the present design of n-bit reversible PIPO shift register is 15n and the
delay is 14n. In Table 5.5 we have compared the proposed design with the
design presented in [18].
In Figure 5.9 two input n-bit reversible multiplexer is presented. An n multi-
plexer will have 2n gates which are n CCnot gates and n Cnot gates. Thus the
87
CHAPTER 5. REVERSIBLE CRYPTOGRAPHIC HARDWARE
Figure 5.9: An two input n-bit reversible multiplexer.
Nayeem et al. [18] ProposedQuantum cost 5n 4nGarbage bits n n
Gates F and HNFG N, C and T
Table 5.6: Comparison of proposed reversible n-bit multiplexer with Nayeem et al. [18] .
linear TrC is 24n. The quantum cost is 4n and delay is 3n+1. Here S is the
Select bit line that selects between the two inputs A1−n and B1−n, to be precise
when S = 0 then Z1−n = A1−n and when S = 1 then Z1−n = B1−n. In Table 5.6
we have compared the proposed design with Nayeem et al. [18]. In hardware
cryptosgraphy, the Montgomery multiplication algorithm is used for modulo
multiplication and the modi�ed algorithm presented in [125] is very e�cient.
Nayeem et al. [18] have implemented this algorithm in reversible logic. We have
followed the same architecture and substituted our designs in respective units
and have improved it's e�ciency. The reversible Montgomery's multiplier is
presented in Figure 5.10. Here the multiple bits are represented by broad lines
and single bits are represented by thin lines.
Reversible Montgomery Multiplier contains: Four-to-two n+1 CSAs, requir-
ing 8(n+1) gates, 12(n+1) elementary quantum gates, 4(n+1) garbage outputs
and TrC of 96(n+1); 2(n+1) Peres gates to compute AND-1 and AND-2 blocks
requiring 4(n+1) gates, 8(n+1) elementary quantum gates, 2(n+1) garbage
outputs and TrC of 96(n+1); Two (n+1)-bit reversible multiplexer, requir-
ing 4(n+1) gates, 8(n+1) elementary quantum gates, 2(n+1) garbage outputs
88
CHAPTER 5. REVERSIBLE CRYPTOGRAPHIC HARDWARE
Figure 5.10: Reversible Montgomery multiplier.
and TrC of 48(n+1); Two (n+1)-bit reversible PIPO shift registers, requiring
18(n+1) gates, 30(n+1) quantum cost, 2(3n+6) garbage outputs and TrC of
96(n+1); Two (n+1)-bit registers, requiring 6(n+1) gates, 2(5n+5) elementary
quantum gates, 2(n+2) garbage outputs and TrC of 64(n+1); Four gates for
selections requiring 8 elementary quantum gates, 4 garbage outputs and TrC
of 48; 2(n+1) Cnot gates to construct two COPY blocks, requiring 2(n+1)
elementary quantum gates and TrC of 16(n+1); One Feynman gate to get an-
other copy of SUM0 which requires one elementary quantum gate and Trc of 8.
Therefore, total number of gates (circuit cost) required to implement Reversible
Montgomery Multiplier is 46n+51 and total linear TrC is 416n+472. The
quantum cost of reversible Montgomery multiplier is 12(n+1)+8(n+1)+8(n+1)
+30(n+1)+2(5n+5)+8+2(n+1)+1= 70n+79.
In Table 5.7 we have compared the proposed design with reversible Montgomery
design of Nayeem et al. [18] and found the proposed design has lower quantum
cost and delay.
Modi�ed Theorem 8: The present n-bit reversible Montgomery multiplier
can be realized by 42n+47 gates, 16n+28 garbage bits and 70n+79 quantum
cost.
89
CHAPTER 5. REVERSIBLE CRYPTOGRAPHIC HARDWARE
[18] ProposedQuantum cost 80n+91 70n+79Garbage bits 16n+28 16n+28
Delay 80n+91 65n+74Gates F, C, MTSG and HNFG NCT
Table 5.7: Comparison of proposed reversible Montgomery multiplier with Nayeem et al.
[18].
We have already provided a list of Theorems and Lemma's presented in [18].
Here we have modi�ed them in accordance with the results and observations
reported in the present thesis. The Modi�ed Theorems and Lemmas are listed
below:
Modi�ed Theorem 1: In NCT gate library 3n gates and n+1 garbage outputs
are su�cient to realize an n-bit reversible register.
Modi�ed Lemma 2: 5n elementary quantum gates are su�cient to realize
an n-bit reversible register.
Modi�ed Theorem 3: In NCT gate library, 8n-1 gates and 2n+1 garbage
outputs are su�cient to realize an n-bit reversible SISO shift register using
master-salve D �ip-�ops.
Modi�ed Lemma 4: 10n elementary quantum gates are su�cient to realize
an n-bit reversible SISO shift register.
Modi�ed Theorem 5: In NCT gate library, the characteristic function of Q+i
of reversible PIPO register can be realized by 4 gates and 4 garbage bits.
Modi�ed Theorem 6: In NCT gate library, 9n gates and 3n+3 garbage
outputs are su�cient to realize an n-bit reversible PIPO shift register using
clocked D �ip-�ops.
Modi�ed Lemma 7: 15n elementary quantum gates are su�cient to realize
an n-bit reversible PIPO shift register.
Modi�ed Theorem 8: The present n-bit reversible Montgomery multiplier
can be realized by 42n+47 NCT gates, 16n+28 garbage bits and 70n+79 quan-
tum cost.
90
CHAPTER 5. REVERSIBLE CRYPTOGRAPHIC HARDWARE
Here we would like to note that the number of elementary quantum gates
mentioned in Modi�ed Lemma and Theorem refers to quantum cost. For ex-
ample, Lemma 2 implies that a quantum circuit for the n-bit reversible register
is designed here with 5n quantum gates. Thus it's quantum cost is 5n. But
minimality of quantum cost is not ensured so it is only su�cient. It is not
essential. Same is true for the other bounds too. In principle, these bounds are
not tight and possibilities of obtaining lesser lower bounds cannot be ignored as
no measure has been taken here (and in earlier work) to ensure the minimality
of the whole circuit. RevKit cannot handle large circuits at the moment. So
we have only ensured minimality of sub circuits but that does not essentially
ensure minimality of the whole circuit.
5.3 Conclusions
In this chapter we have proposed reversible design for the ALU of a crypto-
processor. The proposed designs are implemented in a standard NCT gate
library which can be experimentally realized. The proposed designs have better
lower bound than the bounds reported in [18] for quantum cost. Further in this
chapter we have introduced delay which is an important measure for evaluating
logic design. Quantum cost and delay of the designs are optimized by the
Algorithm 2.1 and level compaction algorithm [20] respectively. Therefore, we
conclude that present designs are more cost e�ective than the existing designs
and are also physically realizable since NCT gates are experimentally realized.
Thus present work may engender a new thread of research in implementation
of reversible cryptographic processor to thwart power analysis attack.
91
Recommended