1
Verification Codes
Michael Luby, Digital Fountain, Inc.
Michael Mitzenmacher
Harvard University and Digital Fountain, Inc.
2
Motivation• LDPC (Low-Density Parity-Check) codes
– Linear time and near optimal performance.– For erasures: Packet-level code, where basic op is XORing
packets. Very fast.– For errors: Bit-level code, where basic op is probabilistic
calculation or XORing bits. Slow.
• Bit level unsuitable for some applications.
• Packet-level codes useful as “outer codes.”
• Can we design packet-level LDPC codes for channels with errors?
3
Design Steps• LDPC codes for symmetric q-ary channels, large
q– Use that errors are random in a large space.
– Tree analysis.
• Adversarial errors to random errors.– Code scrambling, independently discovered by
[L94,GLD01].
• Implementation details.– Packet numbers.
4
Setup• View code as one layer
bipartite graph.• Variable nodes on left.• Check nodes on right.• Codewords satisfy: sum
of all neighbors of a check node is 0.
• Note: sum allows general q. For bits/ packets, sum is XOR.
D
A
B
C
H
F
E
I
J
G
K
5
Verification
• A message node is verified if it is believed to have the right value, with high probability.
• For symmetric q-ary channel, a check node verifies its neighbors if they sum to 0.
A
B
C
H
If A + B + C = 0, thenH verifies A, B, C.
6
Verification Argument• Claim: If errors are random,
probability of a false verification at each step is at most 1/(q-1).
• Proof: Consider the last erroneous message node. It must take on the precise value to cause a false verification.
• For large q, verifications are correct with high probability.
A
B
C
H
7
Correction Mechanism
• If B and C are verified, H can correct and verify value for A.
• Corrections possible when all but one neighbor of a check node has been verified.
• If verifications correct with high probability, so are corrections.
A
B
C
H
8
Message Passing Algorithm 1• Message nodes may be verified or unverified.
– Initially all unverified.
• Message nodes have a value.– Initially received value.
• In parallel message nodes send value, state.• If all neighbors of a check node sum to 0, check node verifies all
neighbors.
• If all but one neighbor of a check node is verifed, the check node can correct and verify the value of the remaining node.
• In parallel check nodes send appropriate messages.• Repeat.
9
Analysis
• Error probability from incorrect verification bounded by (#verification steps)/(q-1). Small for large q.– Ignored hereon in the analysis.
• End goal: all message nodes are verified.
• LDPC tree analysis yields equations.
10
Degree Sequence Functions
• Left Side– fraction of edges of degree i on the left.
• Right Side– fraction of edges of degree i on the right.
( ): .x i xi 1
( ): .x i xi 1
i
i
11
Tree Based Analysis
Left
Right
Left
jj
j
j
ba
b
a
1
Assume verified = correct.
Pr[not verified, correct]
Pr[not verified, incorrect]
Pr[verified]
After j rounds:
12
Tree Based Analysis• Defining equations.
• English: to be correct and unverified after round j+1, must be:– Correct initially.
– Each check node under you must have one incorrect node under it after round j.
• Probability , z = Pr[a check node has no incorrect neighbor after round j]
• .
))1(1(01 jj baa
)1( z
)1( jbz
13
Tree Based Analysis• Defining equations.
• To be incorrect after round j+1, must be:– Incorrect initially.
– Each check node under you must have one unverified node under it after round j.
• Equivalently:
• Want
))1(1(01 jj baa ))1(1(01 jjj babb
))1(1( 101 jjj babb
)))1(1()1(1(1( 001 jjj bbbbb
0jb
14
Bounds• Finding good polynomials and is difficult non-linear
optimization problem.
• From previous work on LDPCs, for code rate R have and such that
• Plugging in this and yield codes that will handle fraction of errors up to:
• Good for low rate codes.
)1/())1(1( Rxx
234
212RRR
15
Proof)))1(1()1(1(1( 001 jjj bbbbb
)))1(1()1(1(1( 00 xxbb
Let x = bj. Then for 0jb
)1/())1(1( RxxNow use
)1/()))1(1()(1( 00 Rxxbbx
)1/())1/()(1( 00 RxRxbb
)1/()))1(1()(1( 00 Rxxbb
Suffices that 1)1/()1)1/(1)(1( 00 RRbb
16
Improved algorithm
D
A
B
C
H
I
E
F
Incorrect
Verified
Verified
17
Improved Algorithm
• Each check node also sends a proposed value to each message node. – Equal to 0 - sum of other neighbors.
• If a message node receives two of the same proposed values, assumes it is the correct value and sets itself to verified.– Important: no cycles of length 4.
• Again, small probability of incorrect verification.
• Similar limiting equations derivable.
18
Reed-Solomon + Verification
D
A
B
C
RS1
RS2
Tradeoff: RS check nodes yield more redundancy,for more powerful check nodes.
19
Reed-Solomon + Verification
• Check nodes consist of a Reed-Solomon code over the message nodes.
• Example: check nodes consists of two points.– So check node of degree d defines a degree d - 1
polynomial.
• Now check nodes can– Correct any single error among neighbors.– Correct two errors if all other neighbors are verified.
20
Bounds• Equations in terms of and derivable. • Finding good polynomials and is a difficult non-linear
optimization problem.• Consider case where check nodes can correct if at most 1
neighbor in error.• LDPC erasure codes correct if 1 at most 1 neighbor erased. • Equivalency implies can correct fraction of
errors.• Better using additional power of check nodes.
21 R
21
Comparison Point
• Standard Reed-Solomon codes can handle error rate of up to (1-R)/2.– Different model: worst case errors, not random.– Also additional techniques: List Decoding.
• Verification codes can provably handle higher error rates, with high probability.– Low rates using simple verification.– High rates using verification + 2 RS nodes.
22
Code Scrambling• Using shared (pseudo)-randomness, errors can appear
random instead of worst case.– Assume a suitably oblivious adversary.– Permute which data in which message packets, so
adversary does not know where errors are introduced.
– Replace jth packet data x with cjx + dj for random values cj
and dj ; inversion causes adversarial error to look random.
• Symmetric q-ary code sufficient against a suitably oblivious channel.
23
Code Scrambling Example
• Consider modulo 61.
• If adversary does not know c, d the inverted values looks random.
7
Data c d Sent Error Inverted
11 23 39 38 18
11 55 5 0 13 8
24
Implementation Details
• Packets must have ID bits.
• If data can be corrupted so might ID bits.
• These errors can be absorbed in data errors.– Bad ID# will either be an erased packet
(equivalent to an error) or will cause another data packet to be in error.
25
Additional Work
• Verification codes based on LT (Luby Transform) codes.
• Extended verification codes: unverified nodes send speculative list of values.
• To appear in the full paper.
26
Related Work
• Gallager mentions LDPC codes for symmetric q-ary channels back in 1960’s. – But did not see the possibilities of verification for
large q.
• LDPC codes over GF(q), Davey and MacKay.– Meant for small q.
• Code scrambling.– Applied primarily to Reed-Solomon codes.
27
Conclusions
• Verification paradigm leads to simple codes for random errors over large alphabets.
• Potentially useful as an outer code.– Inner code say an LDPC code on a packet of bits.– Faulty decoding leads to arbitrary packet errors.
• Useful when avoiding bit-level calculations.– Codes in software vs. hardware.