View
220
Download
1
Category
Tags:
Preview:
Citation preview
Numbers in CodesGCNU 1025
Numbers Save the Day
Coding• Converting information into another form of
representation (codes) based on a specific rule• Encoding: information to symbols• Decoding: symbols back to information
Binary Codes• Two symbols are used to represent data• Example: Morse code, ASCII code
https://www.youtube.com/watch?v=u493fX2hYgU
Morse Codes• On-off tones, lights, clicks, dots and dashes, etc.
Click to learn Morse codes
Binary codes• Language of computers: 0 and 1 (binary system)• Codeword: a string of 0’s and 1’s representing a character• ASCII code: American Standard Code for Information Interchange• 128 characters, of which 33 are control characters• Enables the use of same codewords in different machines
Binary Codes: Play
http://www.binaryhexconverter.com/binary-to-ascii-text-converter
http://www.binaryhexconverter.com/ascii-text-to-binary-converter
Coding of Chinese characters (optional)
• Example: Chinese telegraph code (non-binary)• 4-digit: 0000-9999• Decoding (relatively easy for documentation): 3413法• Encoding (more difficult for documentation):法 3413 • Four-corner method: method for documentation for encoding
Coding of Chinese characters (optional)
• Example: Chinese telegraph code (non-binary)• 4-digit: 0000-9999• Four-corner method: method for documentation for encoding
Coding of Chinese characters (optional)
• Example: Chinese telegraph code (non-binary)• 4-digit: 0000-9999• Four-corner method: method for documentation for encoding
Announcement
• In-class Assignment #1 on Sep 19 (Friday)• 10% of final score• Coverage: up to Section 2.2• Books, notes, other materials and discussions all allowed• Help from instructor and teaching assistant• Assignments submitted after class subject to penalty
Numbers in CodesGCNU 1025
Numbers Save the Day
Error-detection for binary codes
• Rule: every valid codeword has a special property
• Parity check: a validity check concerning the parity (i.e. being odd or even) of the number of 1’s in a codeword
• Example: 1001 sent as 1011• Original number of 1’s: 2 (even)• Number of 1’s in 1011: 3 (odd)• 1 error leads to a change of parity of the number of 1’s• Error detected if all valid codewords consist of an even number of 1’s
Simple parity check• Rule: the last digit of a codeword is a check digit appended
to the original message (to be sent) so that the total number of 1’s in the codeword is even!• Example: sending a message 1000001• Check digit to be appended: 0• Codeword for the message: 10000010 (total number of 1’s: 2)
• Example: sending a message 1001001• Check digit to be appended: 1• Codeword for the message: 10010011 (total number of 1’s: 4)
Error-correction in codes• Is it possible to detect AND correct an error (without re-
transmission/data re-entry)?• Error-correction by multiple entries
• Sending all messages 3 times regardless of existence/absence of errors• Example: 1100 sent as 1100 1100 1100
• Error-correction power: a received message of 1100 1100 1000 can be automatically corrected to 1100 1100 1100 without further data re-entry
• High resources demand: tripling message length
• Error-correction by multiple parity check digits
Error-correction in codes• Example: transmit 1001 by multiple parity check digits
1001101
Error-correction in codes• 1-error correction: if the message received is 1000101
How can we detect/correct the error? (assume at most 1 error)
Lengths of codes• Basic question: how many digits do we need?
• How many digits are needed to encode 2 characters (e.g A, B)?• How many digits are needed to encode 4 characters?• How many digits are needed to encode 26 characters (A-Z)?
Numbers in CodesGCNU 1025
Numbers Save the Day
Efficiency of data transmission
• Run-length encoding (RLE): reduce number of characters transmitted (data compression)• Example: black-and-white documents
Efficiency of data transmission
• Run-length encoding (RLE): reduce number of characters transmitted (data compression)• Example: black-and-white documents
Efficiency of data transmission
• Run-length encoding (RLE): reduce number of characters transmitted (data compression)• Example: black-and-white documents• Reduce length of duplicated characters• Common for faxed documents and files containing runs• Size increased if runs are absent
Efficiency of data transmission
• Example: two different ways of encoding• Type 1
• Type 2
Efficiency of data transmission
• Example: two different ways of encoding• Type 1: fixed number of digits used
• Type 2: different numbers of digits used
Efficiency of data transmission
• Different coding methods• Fixed length code: fixed number of digits used
• Variable length code: different numbers of digits used
Efficiency of data transmission
• Variable length code• Shorter code for frequently used characters: efficiency enhanced• Is there anything wrong with the following code?
• Is there anything wrong in encoding BIT?• Is there anything wrong in decoding 0000001101?
Efficiency of data transmission
• Variable length code• Is there anything wrong with the following code?
• Is there anything wrong in encoding BIT? No!• Is there anything wrong in decoding 0000001101? Yes! Possible multiple interpretations
(BIT or FET)!
• Prefix property: no codeword can be a prefix of another codeword• Uniquely decipherable code: code satisfying the prefix property
Efficiency of data transmission
• Variable length code• Prefix property: no codeword can be a prefix of another codeword• Uniquely decipherable code: code satisfying the prefix property• Example: the code is not uniquely decipherable as the codeword of B is a
prefix of the codeword of F (this set of code does not satisfy the prefix property)
Efficiency of data transmission
• Variable length code• Example: do these two codes satisfy the prefix property?
Numbers in CodesGCNU 1025
Numbers Save the Day
Efficiency of data transmission
• Variable length code• Which of the two uniquely decipherable codes is more efficient?
Efficiency of data transmission
• Variable length code• Which of the two uniquely decipherable codes has a shorter average length?
Efficiency of data transmission
• Variable length code• Which of the two uniquely decipherable codes has a shorter average length?
Scheme 1!• Example: DELETE THE FILE
• Scheme 1: 52 digits in total• Scheme 2: 49 digits in total
Efficiency of data transmission
• Variable length code• Which of the two uniquely decipherable codes has a shorter average length?
Scheme 1!• Example: DELETE THE FILE
• Scheme 1: 52 digits in total• Scheme 2: 49 digits in total• E is a very common (heavy) character• Frequencies (weights) also important!• Weighted average should be considered instead
Efficiency of data transmission
• Variable length code• Weighted average code length
• Example: • Message: DELETE THE FILE
• Weighted average code length:
Efficiency of data transmission
• Variable length code• Weighted average code length
• Example: • Message: DELETE THE FILE
• Weighted average code length:
Efficiency of data transmission
• Variable length code• Weighted average code length
• Choice of frequency tables:• Choice #1: frequency table from specific message
• Choice #2: general frequency table for typical English passages
Efficiency of data transmission
• Variable length code• Weighted average code length
• (Partial) example: Morse code
Classwork: Calculate the weighted average code length for the Morse codes, using the general frequency table
Answer: 2.544
Numbers in CodesGCNU 1025
Numbers Save the Day
Huffman code
• Aim: produce a code with the smallest weighted average code length for a given frequency table • Basic principle: shorter codewords for more frequent characters• Tool: a tree built from bottom to top with characters being the
“leaves”
Huffman code
• Example: a code for 4 characters
• Step 1: combine the 2 with lowest probabilities
Huffman code
• Example: a code for 4 characters
• Step 2: combine the 2 among “D”, “E” and “LT” with lowest probabilities
Huffman code
• Example: a code for 4 characters
• Step 3: combine the 2 among “E” and “LTD” with lowest probabilities
Huffman code
• Example: a code for 4 characters
• Step 4: assign “0” to the branch with the bigger probability and “1” to the branch with the smaller probability
Huffman code
• Example: a code for 4 characters
• Step 4: assign “0” to the branch with the bigger probability and “1” to the branch with the smaller probability
Huffman code
• Example: a code for 4 characters
• Step 5: read out the codewords from the top of the tree
Huffman code
• Example: a code for 4 characters• Does the code constructed this way always satisfy the prefix property?
• If “11” is a codeword for D, is it possible for other codewords to begin with “11”? No (as the branch for D stops at “11”)!
Classwork: Constructing Huffman code
Numbers in CodesGCNU 1025
Numbers Save the Day
Constructing Huffman code
Constructing Huffman code
Constructing Huffman code
Constructing Huffman code
Constructing Huffman code
Constructing Huffman code
Constructing Huffman code
Constructing Huffman code
Constructing Huffman code
Constructing Huffman code
Huffman code: remarks
• Multiple possible Huffman codes for same frequency table• Different number of layers possible
• Are the weighted average code lengths the same?• Different Huffman codes for same frequency table have same weighted
average code length• Smallest weighted average code length guaranteed (proof out of scope)
Huffman codes: comparison
Numbers in CodesGCNU 1025
Numbers Save the Day
Arithmetic coding
• No one-to-one correspondence between characters and codewords (unlike Huffman code)• Encode whole message into one number• Example: “DELETE” encoded as 0.11633 (decimal number)
Arithmetic coding
• Example: “DELETE” encoded as 0.11633 (decimal number)
• Step 1: Divide the interval (0, 1) into portions
Arithmetic coding
• Example: “DELETE” encoded as 0.11633 (decimal number)
• Step 2: Choose (zoom into) portion of first character “D” and divide the portion according to the probabilities (as in Step 1)
Arithmetic coding
• Step 2: Choose (zoom into) portion of first character “D” and divide the portion according to the probabilities (as in Step 1)
Arithmetic coding
• Example: “DELETE” encoded as 0.11633 (decimal number)
• Step 3: Choose (zoom into) portion of second character “E” and divide the portion according to the probabilities
Arithmetic coding
• Step 4: Keep choosing (zooming into) portions in correct order and dividing the chosen portion according to the probabilities
Arithmetic coding
• Step 5: Choose the portion of “END” when the message ends
• Step 6: Choose any number within the range of “END” as the codeword for the message (e.g. 0.11633)
Arithmetic coding
• Example: decoded 0.11633 with the frequency table
• Step 1: Divide into portions• Step 2: Where is 0.11633? Zoom in!
• 0.11633 is in Section D: first character of message is “D”
• Step 3: Repeat Step 1 and 2. Stop when it hits “END”!
Arithmetic coding
Numbers in CodesGCNU 1025
Numbers Save the Day
Units in daily life
• Examples of prefixes: • Mega-pixel• Nano-meter• Giga-watt
SI prefixes
• International system of units• Examples: km, mm, cm, mL • Some common SI prefixes:
Units in data transmission
• SI prefixes commonly used for transmission speed• Example: 100Mbps
• Mbps: Mega-bit per second• Mega (SI prefix): • Bit: binary digit
Units in data transmission
• SI prefixes commonly used for transmission speed• Example: 56kbps
• kbps: kilo-bit per second• kilo (SI prefix): 1000• Bit: binary digit
Binary prefixes
• Different from SI prefixes: same letter, different meaning• 1024 used instead of 1000• Comparison:
Units in computer systems (file size)• Binary prefixes used for file size/actual capacity in computer systems• Example: file of size 10MB
• MB: Mega-byte• Mega (binary prefix): • Byte: 8 bits
Units in telecommunication
• Example: How long does it take to send a 100 MB file with the speed of 100 Mbps?• 100 MB = 100 x 1024 x 1024 x 8 bits• 100 Mbps = 100 x 1000 x 1000 bits per second• (Minimum) Time needed:
Units in telecommunication
• Example: How long does it take to download a 4 MB song via a 56K modem?• 4 MB: 4 x 1024 x 1024 x 8 bits• 56k modem: 56 Kbps transfer rate • 56 Kbps: 56 x 1000 bits per second• (Minimum) Time needed for downloading: ~600 seconds
Classwork 10: telecommunication
Units in hard disk packaging
• Confusion in units:• SI prefixes used in packaging of hard disks/flash drives• True capacity of disk/computer memory (e.g. RAM)/file size expressed by
binary prefixes
Units in hard disk packaging
• Confusion in units:• SI prefixes used in packaging of hard disks/flash drives
• Example: true capacity of a 4 GB flash drive• 4 GB flash drive: bytes
• True capacity of disk/computer memory (e.g. RAM)/file size expressed by binary prefixes• Example: size of a 4 GB file
• 4 GB file: bytes (a 4 GB flash drive does not have enough space for a 4 GB file!)
Numbers in Codes-End-
Recommended