Week 6 Coding systems and error detection the Hamming distance between valid codes is 2) parity can detect 1-bit errors (d 1 = 1), but cannot correct errors (d 1 2 = 0) advantages

Computer Mathematics

Week 6Coding systems and error detection

Department of Mechanical and Electrical System Engineering

last week

Mouse Keyboard GPU Audio

Input / OutputController

Universal Serial Bus PCI Bus

Central Processing Unit

addressbus

CU

PCIR

ALU

registers

PSR

DR

operationselect

incrementPC

AR

RandomAccessMemory

0

4

8

16

20

24

28

databus

HDD SSD Net

coding theory

� source coding

information theory concept

� information content

binary codes

� numbers

� text

variable-length codes

� UTF-8

compression

� Huffman’s algorithm

2

this week



Mouse GPU


addressbus

CU

PCIR

ALU

registers

PSR

DR

operationselect

incrementPC

AR

RandomAccessMemory

0

4

8

16

20

24

28

databus

Keyboard HDD Audio SSD Net

coding theory

� channel coding


� Hamming distance

error detection

� motivation

� parity

� cyclic redundancy checks

error correction

� motivation

� block parity

� Hamming codes

3

channel coding — motivations

channel coding protects against data corruption

while being transmitted

� drop outs, dead zones

� electromagnetic interference... 1 0 ... ... 0 1 ...

transmitter receiver

while being stored

� device failure

� ionising radiation

0

00

0

0

00

0

0

0

0

1

1

1

1

1

1 1

1

1 1

1

1

1

0

4

channel coding — approach

add information to a message, allowing corruption to be detected

recover uncorrupted data by

� retransmission– receiver asks for the same data again

� redundancy– repetition: multiple copes of the same data are transmitted/stored, or– encoding: additional bits allow small errors to be identified and corrected

the medium can affect the choice of mechanism, for example

� networks: retransmission is always possible

� DVD/Blu-ray: retransmission is never possible

5

error detection using repetition

send each bit once: 0 or 1 0−→ −→ 1

� no error detection or correction

send each bit twice: 00 or 11 00−→ −→ 01 / 10

� single bit error detection

� no error correction

send each bit three times: 000 or 111 000−→ −→ 001 / 010 / 100

� double bit error detection

� single bit error correction

6


Hamming distance

� the number of bits of difference between two valid codes

valid Hamming error errorcodes distance detection correction

0, 1 1 0 bits 0 bits00, 11 2 1 0

000, 111 3 2 10000, 1111 4 3 100000, 11111 5 4 2

d d− 1 d−12

sending each bit twice (as 00 or 11)

� repeats the data verbatim (obviously), but also

� guarantees an even number of bits are transmitted per bit of information

we can generalise this last idea. . .

7

parity

parity

mathematics: the property of an integer with respect to being odd or even

computing: the state of being odd or even used to detect errors in binary-codeddata

the parity of a block of bits is

� even, if there is an even number of 1s

� odd, if there is an odd number of 1s

0000 even parity0001 odd parity0010 odd parity0011 even parity0100 odd parity

consider a single bit error in a block of bits

� either a 0 changes to a 1

� or a 1 changes to a 0 01010101 even parity01011101 odd parity

in either case, the number of 1s in the block changes by 1

� the parity changes (from odd to even, or from even to odd)

8

parity bits

idea: add a redundant bit to each block of data that forces it to have even parity; e.g.,

� if the block of data has even parity, add a 0 to the end

� if the block of data has odd parity, add a 1 to the end

(or you could choose to force odd parity, it doesn’t matter)

00000 even parity00011 even parity00101 even parity00110 even parity01001 even parity

↑ parity bitrecall: changing a single bit changes the parity

� the Hamming distance between valid codes is 2

⇒ parity can detect 1-bit errors (d− 1 = 1), but cannot correct errors (d−12 = 0)

advantages

� trivial to implement (add the bits, modulo 2)

� can trade between block size and reliability

disadvantages

� longer blocks have higher chance of undetectable double bit errors

� error recover requires retransmission let’s fix both of those. . .9

error detection for larger blocks — checksums

a checksum is a number calculated from the content of some data

� if the content changes, the checksum also changes

they can be used to ‘sign’ data, proving the content is authentic (or uncorrupted)

� on reception, verify that the checksum is correct

� also called ‘digital signatures’, or ‘digests’

popular checksums for digitally signing large blocks of data:

� md5 (e.g., the ‘md5’ program on Linux and Mac)

� sha256 (the ‘shasum’ program on Linux and Mac)

popular checksum for detecting changes in smaller blocks of data

� CRC-32 (e.g., the ‘sum’ program on Linux and Mac)

� CRC = Cyclic Redundancy Check

10

creating a cyclic redundancy check code

a n-bit CRC is the remainder after dividing the data by a large number

� a n+ 1 bit divisor is known to both sender and receiver

� the dividend consists of the original message data bits

� n additional bits, added to the right of the data, are initially all 0– this is where the remainder will appear

then a long division is performed; repeatedly:

� align the leftmost 1of the divisor with the leftmost 1of the dividend (data)

� ‘subtract’ the divisor from the dividend, using modulo-2 arithmetic on individualbits

(this will always change the leftmost 1 in the dividend to a 0)

until the dividend contains only 0s

the n additional bits to the right of the dividend now contain the ‘remainder’

� this is your n-bit CRC

� transmit it along with the message

11

verifying a cyclic redundancy check code

when the message is received

� repeat the CRC process described above, but

� initialise the n additional bits with the received CRC code

� when the algorithm finishes, if the data is undamaged, the n CRC bits will all be 0

using a 3-bit divisor 101, a 2-bit CRC for 10101110 is

transmitter10101110 0010100001110 00

10100000100 00

1010000000100

10100000000 01

−→

receiver10101110 0110100001110 01

10100000100 01

1010000000101

10100000000 00 ok

receiver (corrupted)10100110 0110100000110 01

1010000011101

10100000010 01

10 100000000 11 bad!

12

cyclic redundancy checks in communications

CRC-32 is a 32-bit cyclic redundancy check code

� the data validation code is a 32-bit checksum

� the checksum is based on a cyclic algorithm(a kind of long division, of the data by a 33-bit divisor)

� it is added to the data redundantly(it expands the message without adding new information)

CRCs are popular because they are

� simple to implement in hardware, including for serial streams of bits

� very good at detecting errors caused by electromagnetic noise

CRC-32 is used in Ethernet networks to detect corrupted packets

13

error correction — block parity

parity tells us if a word of data has an error

in a sequence of words

even parity bit ↓01010000 001100001 101110010 0011010010

? → 01100100 001111001 1

� we know which word is corrupted, but

� we do not know which bit in the word is corrupted

thinking two-dimensionally

� we know the row

� we do not know the column

to identify the column, use a parity word

even parity bit ↓01010000 001100001 101110010 0011010010

? → 01100100 001111001 1

even parity word→ 001001110↑?

� a group of parity bits working on columns, not rows

� identifies which column contains the corrupted bit

the parity bits provide lateral parity

the parity words provide longitudinal parity, for a few data words at a time

14

Hamming codes

block parity considers data as a 2-dimensional array of bits with (x, y) coordinates

if the bit at (x, y) is corrupted, we can identify it because

� its row’s lateral parity bit will be wrong, telling us y

� its column’s longitudinal parity bit will be wrong, telling us x

instead of (x, y) addressing within an array of bits, we could number them

� then use parity bits to identify the number of a bit that is corrupted

15

Hamming codesnumber the bits, starting from 1

any bit whose number is a power of 2 is a parity bit, pn (numbered 2n)

each parity bit pn checks all bits whose binary numbering includes 2n

� for example, bit 9 is numbered 10012 in binary, and so

� it is included in the parity calculations of p3 and p0 (because 23 + 20 = 9)

message: p0p1 0 p2 1 1 1 p3 0 1 0 0

bit number: 1 2 3 4 5 6 7 8 9 1 1 10 1 2

in binary: 1 0 1 0 1 0 1 0 1 0 1 0 ← 1 means this bit is included in p0

0 1 1 0 0 1 1 0 0 1 1 0 ← 1 means this bit is included in p1



encoded: 0 1 0 1 1 1 1 1 0 1 0 0

using even parity: p0 = 001100bits 135791

1

, p1 = 101110bits 236711

01

, p2 = 11110bits 45671

2

, p3 = 10100bits 89111

012

(the pattern can be extended to the right, for as many data and parity bits as needed)16

Hamming codes

example:

1 2 3 4 5 6 7 8 9 10 11 12

stored data: 0 1 0 1 1 1 1 1 0 1 0 0retrieved data: 0 1 0 1 1 0 1 1 0 1 0 0

check for correct (even) parities:

p0 = 001100bits 135791

1

ok

p1 = 100110bits 236711

01

wrong

p2 = 11010bits 45671

2

wrong

p3 = 10100bits 89111

012

ok

the incorrect parities are p2 and p1, so the corrupted bit is number 22 + 21 = 4+ 2 = 6

17

channel coding in action

redundant arrays of identical disks for server data storage

� repetition: RAID 1

� parity: RAID 3, RAID 5

ECC (error correcting code) RAM for server memory

� uses Hamming codes plus one additional parity bit

� 100% reliably performs single error correction, double error detection (SECDED)

� essential, if you care at all about losing 1 bit per hour per gigabyte of memory!

18

next week



Mouse GPU


addressbus

CU

PCIR

ALU

registers

PSR

DR

operationselect

incrementPC

AR

RandomAccessMemory

0

4

8

16

20

24

28

databus

Keyboard HDD Audio SSD Net

the mathematics of logic circuits

� the foundation of all digital design

Boolean logic

� when 0 and 1 represent true and false

Boolean algebra

� Boolean functions

� canonical forms

simplification of Boolean expressions

� de Morgan’s laws

19

homework

practice generating and checking parity bits

� write a Python program to generate parity bits– simulate some errors in the data– detect the errors by checking the parity

practice block parity

� extend your Python program to correct single-bit errors

practice generating small CRC codes

� write a Python program to generate a CRC for strings

� using the same divisor, compare your result to somebody else’s result

ask about anything you do not understand

� from any of the classes so far this semester (or the lecture notes)

� it will be too late for you to try to catch up later!

� I am always happy to explain things differently and practice examples with you

20

glossary

channel coding — a form of encoding that adds redundant data to a message to allow detection and/orcorrection of message corruption.

checksum — a number that characterises the content of a block of data. The checksum is included withthe data during transmission, and verified during reception.

corruption — damage to a message during storage or communication, causing the values of bits tochange.

Cyclic Redundancy Check — a checksum based on modular division, especially good at detecting longsequences of corrupted bits as can occur in network communication.

error correction — identifying the specific bit that was corrupted, allowing it to be repaired.

error detection — identifying the presence of a corrupted bit, allowing retransmission of the message tobe requested.

even parity — a property of a block of data in which the number of 1s is even.

even parity bit — a bit added to a block of data that guarantees the data will have even parity.

Hamming code — a parity-based encoding of data that allows a corrupted bit to be identified andcorrected.

21

Hamming distance — the number of bits that must be changed to convert a valid message into anothervalid message.

lateral parity — parity that works ‘horizontally’ across a row of bits.

longitudinal parity — parity that works ‘vertically’ down a column of bits.

medium — anything that can carry a message. Media include copper wire for electrical messages, glassfibre for optical messages, space for electromagnetic radio messages, etc.

odd parity — a property of a block of data in which the number of 1s is odd.

odd parity bit — a bit added to a block of data that guarantees the data will have odd parity.

packet — in a computer network, a single unit of transmission (and retransmission, in the case of error).

parity — the quality of being even or odd.

redundant — data that is added to a message that does not increase the information content of themessage. Redundant data adds the ability to detect or correct errors within the message content.

repetition — transmission (or storage) of the same information multiple times.

retransmission — transmission of a message that has already been transmitted, but was found to containerrors when received.

22

Documents

Week 6 Coding systems and error detection the Hamming distance between valid codes is 2) parity can detect 1-bit errors (d 1 = 1), but cannot correct errors (d 1 2 = 0) advantages