Upload
ruth-reed
View
221
Download
0
Embed Size (px)
Citation preview
3
3
OCR-Number (OCR-nummer ) ?
• A reference number links your payment to the invoice
• Typed OCR in-correctly ? Error detection?
• E24.se reported (how it is possible?)– Företagaren Thomas Hultberg fyllde i fel
OCR-nummer när han skulle betala in skatten. Nu tvingas han betala in de 229 000 kronorna en gång till.
4
4
Outline of the presentation
• About me• Errors and Error Effect• Error Control Coding• Personnummer, OCR• Hamming code• My Research and Results
5
5
About Me
• Education – 1978.7—1982.7 Electronics Engineering, Harbin C.
U.– 1982.7—1986.4 Computer Science, SWJTU– 1987.2—1991.4 Electrical Engineering, SWJTU
• Work – 1986.1—1993.4 Lecturer, Associate Prof. SWJTU– 1993.4—1994.6 Guest researcher, Linköping Uni.– 1994.7– present HKr
6
6
Information Transmission System• Source encoding (remove redundancy)• Encoder ( add redundancy )• Decoder ( error detection/correction)• Source decoding
Information Sink
Receiver(Decoder)
Transmitter(Encoder)
CommunicationChannel
Information Source
Noisek-digit k-digitn-digit n-digit
7
7
Introduction to Coding Theory
• Also called channel coding• The study of methods for efficient and
accurate transfer of information– Detecting and correcting transmission errors
9
9
Errors and Error Effect
• Errors
0 1 or 1 0
Bits can be lost
• Error effect
downloaded programs from Internet ?
CD music ?
Internet banking services ?
• Errors must be detected/corrected !!!
10
10
Channel model -- BSC
• Binary symmetric channel– p: the bit error probability
0 0
1 1
p1
p1
p
p
11
11
Why Error Control Coding ?
• Bit error rate p p = 1/100000 = 10-5 for optical disks p = 10-11 for a fiber link
• Some calculations p = 10-6
• download a file of length 107 bits 10 bit errors • Data rate at 10 Mbps 1 bit error in every 1
sec!!
p = 10-11, and data rate 10 Gigabits/sec 1 bit error each 10 second !
12
12
• add additional information, or redundancy
to data
• added by sender, checked by receiver
• k data digits encoded to a codeword of n digits
• Code rate r = k / n
k nEncoded as
codeword
Error Control Coding – Principle
13
13
Application Example– Swedish personal ID
• 640823-3234 ?• yy mm dd – nnnP
yy mm dd– year month daynnn – serial number
odd– for male, even for femaleP ? That is parity check digit
Used for error detection !• OCR number uses the same technique
14
14
Personal ID Encoding Method
position 1 2 3 4 5 6 7 8 9 10 6 4 0 8 2 3 3 2 3 ?2×odd 12 4 0 8 4 3 6 2 6add2-digits 3 4 0 8 4 3 6 2 6sum = 3 + 4 + 0 + 8 + 4 + 3 + 6 + 2 + 6 = 36take the last digit of the sum: 6parity check digit = 10 – 6 = 4 640823-3234
15
15
Personal ID Error Detection
640823-3234 460823-3234 ?position 1 2 3 4 5 6 7 8 9 10 4 6 0 8 2 3 3 2 3 ?2×odd 8 6 0 8 4 3 6 2 6add2-digits 8 6 0 8 4 3 6 2 6sum = 8 + 6 + 0 + 8 + 4 + 3 + 6 + 2 + 6 = 43take the last digit of the sum: 3parity check digit = 10 – 3 = 7 It is not equal to 4 Error in the number !
16
16
• The same coding methods have been
used for
– OCR reference number
– Bankgironummer
– Organisationsnummer
• Reference
– http://www.lur.nu/OCR/generera.php
Parity Check Applications
17
17
• Binary Hamming [7, 4] code k = 4, n =
7 Encode 4 data bits by adding 3 parity bits
Can correct any single error
• Encoding
a b c d a b c d x y z
Where a, b, c, d are information bits
x, y, z are parity check bits
Error Correcting Code– Hamming Code
18
18
• Given a, b, c, d. How to get x, y, z ?
Place a, b, c, d in the intersections
Label circles by x, y, z
Parity checking rule:
the sum of each circle is 0
x = a+b+c, y = a + c + d, z = b + c +
d
Hamming Code
a
b cd
x y
z
19
19
• Given a, b, c, d. How to get x, y, z ?
0101 0101 xyz
so the codeword is 0101 110
Hamming Code Example
0
1 01
1 1
0
0
1 01
x y
z
a
b cd
x y
z
20
20
• 0101 110 sent 0100 110 received. Encode 0100 0100 101
Compare 101 with received 110 101 110 = 011, there is an error
bit d must be in error, it affects y, z correction 0101
Hamming Code for Error Correction
0
1 00
1 0
1
0
1 00
1 1
0 received reconstructed
21
21
• Only detect errors
– Using protocol to correct errors:
ACK: positive acknowledgement ( I got it) NAK: negative acknowledgement ( sorry )
• Simple, reliable, high code rate• Used in data communications
Error Detecting Codes
sender receivercodeword
ACK/NAK
22
22
• Detect and correct errors
• No feedback channel required• Complicated, lower code rate (k/n)• Used in storage systems (computer
storage, CD, DVD), and • space communications
Error Correcting Codes
sender receivercodeword
23
23
Generator Matrix and Encoding
Generator matrix G– Example Hamming [7, 4] code
– Encoding: (a,b,c,d) G = c codeword
a
b cd
x y
z
x = a+b+c, y = a + c + d, z = b + c + d
1101000
1110100
1010010
0110001
G
24
24
Parity Check Matrix and Decoding
Parity check matrix H– HGtr = 0– Example Hamming [7, 4] code
– Syndrome s: a column vector of length n-k– Decoding
• Received codeword y: Hytr = s the syndrome of y• If s = 0, no error detected• Otherwise, there must be errors
a
b cd
x y
z
1001110
0100101
0011111
H
25
25
Optimal codes
• Distance or d-optimal code– A linear [n, k, d]q code is d-optimal if there does not
exist an [n, k, d+1]q code.
• Length or n-optimal code– A linear [n, k, d]q code is n-optimal if there does not
exist an [n–1, k, d]q code.
• k-optimal code– A linear [n, k, d]q code is k-optimal if there does not
exist an [n, k+1, d]q code.
26
26
My Research
• Difference triangle sets– A generalization of Golomb rulers– 1989—1995
• Majority-logic decodable codes– 1988—1992
• Quasi-Twisted codes (ex. a [112,13,48] code)– 1989—present
• Two-weight codes and graphs– 2006—present
27
27
Difference Triangle Set
• Golomb ruler 1 4 6– R = {0, 1, 4, 6} 3 5– Difference triangle 2
• Difference triangle sets– A generalization of Golomb rulers– A set of t Golomb rulers – R1 = {0, 6, 11, 13}, R2 = {0, 8, 17, 18}, R3 = {0, 3,
15, 19} 6 11 13 8 17 18 3 15 19 5 7 9 10 12 16 2 1 4
28
28
Current Research Interests
• Quasi-Twisted Codes – Many QT codes are good or optimal– Computer constructions
• 2-weight codes– Non-zero codewords of weights w1 or w2
– Related to strongly regular graphs
29
29
Computer search for QT Codes
• Given a cyclic weight matrix of order s
• How to select p columns such that– Maximize the minimum row sums of p cols– Row sums of values w1 or w2
0321
3012
2101
1210
...
...
...
...
...
dddd
dddd
dddd
dddd
D sss
ss
s
30
30
Computer search for QT Codes
• Given a cyclic weight matrix of order s
– Columns 0, 1, 3 produces a QT code with minimum distance of 6
– row sums are of values 6 and 8 two-weight code found
2213323
3221332
2322133
3232213
3323221
1332322
2133232
D
31
31
Publications
• Co-authored books– One text book ( VAX-11 Assembly lang. prog. )– One book ( combinatorial coding theory and appl.)
• IEEE Trans. Information Theory – 8 papers
• IEE Electronics Letters– 6 papers
• Codes, Designs and Cryptography– 1 paper (DDD disjoint distinct difference set)
32
32
Online Database on Codes
• A web database of binary quasi-cyclic codes
http://moodle.tec.hkr.se/~chen/research/codes/searchqc2.htm
see also: codetables http://www.codetables.de
• A Web database of two-weight codeshttp://moodle.tec.hkr.se/~chen
/research/2-weight-codes/search.php