Numbers in Codes GCNU 1025 Numbers Save the Day. Coding Converting information into another form of...

Preview:

Citation preview

Numbers in CodesGCNU 1025

Numbers Save the Day

Coding• Converting information into another form of

representation (codes) based on a specific rule• Encoding: information to symbols• Decoding: symbols back to information

Binary Codes• Two symbols are used to represent data• Example: Morse code, ASCII code

Morse Codes• On-off tones, lights, clicks, dots and dashes, etc.

Click to learn Morse codes

Binary codes• Language of computers: 0 and 1 (binary system)• Codeword: a string of 0’s and 1’s representing a character• ASCII code: American Standard Code for Information Interchange• 128 characters, of which 33 are control characters• Enables the use of same codewords in different machines

Binary Codes: Play

http://www.binaryhexconverter.com/binary-to-ascii-text-converter

http://www.binaryhexconverter.com/ascii-text-to-binary-converter

Coding of Chinese characters (optional)

• Example: Chinese telegraph code (non-binary)• 4-digit: 0000-9999• Decoding (relatively easy for documentation): 3413法• Encoding (more difficult for documentation):法 3413 • Four-corner method: method for documentation for encoding

Coding of Chinese characters (optional)

• Example: Chinese telegraph code (non-binary)• 4-digit: 0000-9999• Four-corner method: method for documentation for encoding

Coding of Chinese characters (optional)

• Example: Chinese telegraph code (non-binary)• 4-digit: 0000-9999• Four-corner method: method for documentation for encoding

Announcement

• In-class Assignment #1 on Sep 19 (Friday)• 10% of final score• Coverage: up to Section 2.2• Books, notes, other materials and discussions all allowed• Help from instructor and teaching assistant• Assignments submitted after class subject to penalty

Numbers in CodesGCNU 1025

Numbers Save the Day

Error-detection for binary codes

• Rule: every valid codeword has a special property

• Parity check: a validity check concerning the parity (i.e. being odd or even) of the number of 1’s in a codeword

• Example: 1001 sent as 1011• Original number of 1’s: 2 (even)• Number of 1’s in 1011: 3 (odd)• 1 error leads to a change of parity of the number of 1’s• Error detected if all valid codewords consist of an even number of 1’s

Simple parity check• Rule: the last digit of a codeword is a check digit appended

to the original message (to be sent) so that the total number of 1’s in the codeword is even!• Example: sending a message 1000001• Check digit to be appended: 0• Codeword for the message: 10000010 (total number of 1’s: 2)

• Example: sending a message 1001001• Check digit to be appended: 1• Codeword for the message: 10010011 (total number of 1’s: 4)

Error-correction in codes• Is it possible to detect AND correct an error (without re-

transmission/data re-entry)?• Error-correction by multiple entries

• Sending all messages 3 times regardless of existence/absence of errors• Example: 1100 sent as 1100 1100 1100

• Error-correction power: a received message of 1100 1100 1000 can be automatically corrected to 1100 1100 1100 without further data re-entry

• High resources demand: tripling message length

• Error-correction by multiple parity check digits

Error-correction in codes• Example: transmit 1001 by multiple parity check digits

1001101

Error-correction in codes• 1-error correction: if the message received is 1000101

How can we detect/correct the error? (assume at most 1 error)

Lengths of codes• Basic question: how many digits do we need?

• How many digits are needed to encode 2 characters (e.g A, B)?• How many digits are needed to encode 4 characters?• How many digits are needed to encode 26 characters (A-Z)?

Numbers in CodesGCNU 1025

Numbers Save the Day

Efficiency of data transmission

• Run-length encoding (RLE): reduce number of characters transmitted (data compression)• Example: black-and-white documents

Efficiency of data transmission

• Run-length encoding (RLE): reduce number of characters transmitted (data compression)• Example: black-and-white documents

Efficiency of data transmission

• Run-length encoding (RLE): reduce number of characters transmitted (data compression)• Example: black-and-white documents• Reduce length of duplicated characters• Common for faxed documents and files containing runs• Size increased if runs are absent

Efficiency of data transmission

• Example: two different ways of encoding• Type 1

• Type 2

Efficiency of data transmission

• Example: two different ways of encoding• Type 1: fixed number of digits used

• Type 2: different numbers of digits used

Efficiency of data transmission

• Different coding methods• Fixed length code: fixed number of digits used

• Variable length code: different numbers of digits used

Efficiency of data transmission

• Variable length code• Shorter code for frequently used characters: efficiency enhanced• Is there anything wrong with the following code?

• Is there anything wrong in encoding BIT?• Is there anything wrong in decoding 0000001101?

Efficiency of data transmission

• Variable length code• Is there anything wrong with the following code?

• Is there anything wrong in encoding BIT? No!• Is there anything wrong in decoding 0000001101? Yes! Possible multiple interpretations

(BIT or FET)!

• Prefix property: no codeword can be a prefix of another codeword• Uniquely decipherable code: code satisfying the prefix property

Efficiency of data transmission

• Variable length code• Prefix property: no codeword can be a prefix of another codeword• Uniquely decipherable code: code satisfying the prefix property• Example: the code is not uniquely decipherable as the codeword of B is a

prefix of the codeword of F (this set of code does not satisfy the prefix property)

Efficiency of data transmission

• Variable length code• Example: do these two codes satisfy the prefix property?

Numbers in CodesGCNU 1025

Numbers Save the Day

Efficiency of data transmission

• Variable length code• Which of the two uniquely decipherable codes is more efficient?

Efficiency of data transmission

• Variable length code• Which of the two uniquely decipherable codes has a shorter average length?

Efficiency of data transmission

• Variable length code• Which of the two uniquely decipherable codes has a shorter average length?

Scheme 1!• Example: DELETE THE FILE

• Scheme 1: 52 digits in total• Scheme 2: 49 digits in total

Efficiency of data transmission

• Variable length code• Which of the two uniquely decipherable codes has a shorter average length?

Scheme 1!• Example: DELETE THE FILE

• Scheme 1: 52 digits in total• Scheme 2: 49 digits in total• E is a very common (heavy) character• Frequencies (weights) also important!• Weighted average should be considered instead

Efficiency of data transmission

• Variable length code• Weighted average code length

• Example: • Message: DELETE THE FILE

• Weighted average code length:

Efficiency of data transmission

• Variable length code• Weighted average code length

• Example: • Message: DELETE THE FILE

• Weighted average code length:

Efficiency of data transmission

• Variable length code• Weighted average code length

• Choice of frequency tables:• Choice #1: frequency table from specific message

• Choice #2: general frequency table for typical English passages

Efficiency of data transmission

• Variable length code• Weighted average code length

• (Partial) example: Morse code

Classwork: Calculate the weighted average code length for the Morse codes, using the general frequency table

Answer: 2.544

Numbers in CodesGCNU 1025

Numbers Save the Day

Huffman code

• Aim: produce a code with the smallest weighted average code length for a given frequency table • Basic principle: shorter codewords for more frequent characters• Tool: a tree built from bottom to top with characters being the

“leaves”

Huffman code

• Example: a code for 4 characters

• Step 1: combine the 2 with lowest probabilities

Huffman code

• Example: a code for 4 characters

• Step 2: combine the 2 among “D”, “E” and “LT” with lowest probabilities

Huffman code

• Example: a code for 4 characters

• Step 3: combine the 2 among “E” and “LTD” with lowest probabilities

Huffman code

• Example: a code for 4 characters

• Step 4: assign “0” to the branch with the bigger probability and “1” to the branch with the smaller probability

Huffman code

• Example: a code for 4 characters

• Step 4: assign “0” to the branch with the bigger probability and “1” to the branch with the smaller probability

Huffman code

• Example: a code for 4 characters

• Step 5: read out the codewords from the top of the tree

Huffman code

• Example: a code for 4 characters• Does the code constructed this way always satisfy the prefix property?

• If “11” is a codeword for D, is it possible for other codewords to begin with “11”? No (as the branch for D stops at “11”)!

Classwork: Constructing Huffman code

Numbers in CodesGCNU 1025

Numbers Save the Day

Constructing Huffman code

Constructing Huffman code

Constructing Huffman code

Constructing Huffman code

Constructing Huffman code

Constructing Huffman code

Constructing Huffman code

Constructing Huffman code

Constructing Huffman code

Constructing Huffman code

Huffman code: remarks

• Multiple possible Huffman codes for same frequency table• Different number of layers possible

• Are the weighted average code lengths the same?• Different Huffman codes for same frequency table have same weighted

average code length• Smallest weighted average code length guaranteed (proof out of scope)

Huffman codes: comparison

Numbers in CodesGCNU 1025

Numbers Save the Day

Arithmetic coding

• No one-to-one correspondence between characters and codewords (unlike Huffman code)• Encode whole message into one number• Example: “DELETE” encoded as 0.11633 (decimal number)

Arithmetic coding

• Example: “DELETE” encoded as 0.11633 (decimal number)

• Step 1: Divide the interval (0, 1) into portions

Arithmetic coding

• Example: “DELETE” encoded as 0.11633 (decimal number)

• Step 2: Choose (zoom into) portion of first character “D” and divide the portion according to the probabilities (as in Step 1)

Arithmetic coding

• Step 2: Choose (zoom into) portion of first character “D” and divide the portion according to the probabilities (as in Step 1)

Arithmetic coding

• Example: “DELETE” encoded as 0.11633 (decimal number)

• Step 3: Choose (zoom into) portion of second character “E” and divide the portion according to the probabilities

Arithmetic coding

• Step 4: Keep choosing (zooming into) portions in correct order and dividing the chosen portion according to the probabilities

Arithmetic coding

• Step 5: Choose the portion of “END” when the message ends

• Step 6: Choose any number within the range of “END” as the codeword for the message (e.g. 0.11633)

Arithmetic coding

• Example: decoded 0.11633 with the frequency table

• Step 1: Divide into portions• Step 2: Where is 0.11633? Zoom in!

• 0.11633 is in Section D: first character of message is “D”

• Step 3: Repeat Step 1 and 2. Stop when it hits “END”!

Arithmetic coding

Numbers in CodesGCNU 1025

Numbers Save the Day

Units in daily life

• Examples of prefixes: • Mega-pixel• Nano-meter• Giga-watt

SI prefixes

• International system of units• Examples: km, mm, cm, mL • Some common SI prefixes:

Units in data transmission

• SI prefixes commonly used for transmission speed• Example: 100Mbps

• Mbps: Mega-bit per second• Mega (SI prefix): • Bit: binary digit

Units in data transmission

• SI prefixes commonly used for transmission speed• Example: 56kbps

• kbps: kilo-bit per second• kilo (SI prefix): 1000• Bit: binary digit

Binary prefixes

• Different from SI prefixes: same letter, different meaning• 1024 used instead of 1000• Comparison:

Units in computer systems (file size)• Binary prefixes used for file size/actual capacity in computer systems• Example: file of size 10MB

• MB: Mega-byte• Mega (binary prefix): • Byte: 8 bits

Units in telecommunication

• Example: How long does it take to send a 100 MB file with the speed of 100 Mbps?• 100 MB = 100 x 1024 x 1024 x 8 bits• 100 Mbps = 100 x 1000 x 1000 bits per second• (Minimum) Time needed:

Units in telecommunication

• Example: How long does it take to download a 4 MB song via a 56K modem?• 4 MB: 4 x 1024 x 1024 x 8 bits• 56k modem: 56 Kbps transfer rate • 56 Kbps: 56 x 1000 bits per second• (Minimum) Time needed for downloading: ~600 seconds

Classwork 10: telecommunication

Units in hard disk packaging

• Confusion in units:• SI prefixes used in packaging of hard disks/flash drives• True capacity of disk/computer memory (e.g. RAM)/file size expressed by

binary prefixes

Units in hard disk packaging

• Confusion in units:• SI prefixes used in packaging of hard disks/flash drives

• Example: true capacity of a 4 GB flash drive• 4 GB flash drive: bytes

• True capacity of disk/computer memory (e.g. RAM)/file size expressed by binary prefixes• Example: size of a 4 GB file

• 4 GB file: bytes (a 4 GB flash drive does not have enough space for a 4 GB file!)

Numbers in Codes-End-

Recommended