Upload
leo-clifton-howard
View
217
Download
1
Embed Size (px)
Citation preview
Data Representation
CS280 – 09/13/05
Binary (from a Hacker’s dictionary)
A base-2 numbering system with only two digits, 0 and 1, which is perfectly suited for electronic operations since it can be expressed by power states (on/off), voltage levels (high/low) or charge (positive/negative), but is less than ideal for humans, who find it awkward to say things like “It’s a Catch – 10110 situation,” “He’s the 11011010-pound gorilla,” and “That’s the 10110010011011001-dollar question.”
“There are only 10 kinds of people in the world…those that understand binary and those that do not.” (from the ACM CS t-shirt).
Looking at more data
Character representations
Data and Data Representation
So how is this data that we operate on stored in the computer?
Let’s start with numbers
Binary codes are Base 2 We “think” and operate in Base 10.
What does this mean?
Counting
Base 10 has 10 digits to represent different numbers of things
Base 2 has only 2 digits available.
Counting
Base 10
0 1 2 3 4 5 6 7 89
then we run out of unique digits.
So we move to a positional system.
10 – means we have ten things – a 1 in the 10’s position and no more things in the 1’s position.
Binary counting
0 1 then we run out of digits
10
This represents the number 2. A 1 in the 2’s position and a 0 in the 1’s position.
Positional notation
10’s positions represented
1 * 100 = 1 1 * 101 = 10 1 * 102 = 100 1 * 103 = 1,000 1 * 104 = 10,000 1 * 105 = 100,000 1 * 106 = 1,000,000
2’s positions represented
1 * 20 = 1 1 * 21 = 2 1 * 22 = 4 1 * 23 = 8 1 * 24 = 16 1 * 25 = 32 1 * 26 = 64
How does the computer then store numbers?
Let’s say we want to represent the number 53 in binary.
5310 = 110101
Why?
See chart next page.
Converting from binary to decimalUse chart
1 * 20 = 1 1 * 21 = 2 1 * 22 = 4 1 * 23 = 8 1 * 24 = 16 1 * 25 = 32 1 * 26 = 64 1 * 27 = 128 1 * 28 = 256
75 decimalMust use 7 bits
xxxxxxx
75 – 64 = 9
1xxxxxx
32 and 16 are not used
100xxxx
9 – 8 = 1
1001xxx
4 and 2 are not used last digit is 1
1001001
What is the general subtractionalgorithm to convert from binary to decimal number?
Converting binary to decimal
274 – decimal
4 * 100 = 4 7 * 101 = 70 2 * 102 = 200
total 274
1011 – binary
1 * 20 = 1 1 * 21 = 2 0 * 22 = 0 1 * 23 = 8
total 11
What is the general algorithm for taking a number in base X and converting it to its base 2 equivalent?
Number representation
Numbers are represented by their corresponding binary representation We are disregarding sign We are disregarding floating point
What about other kinds of data?
Think about the binary values as a kind of code.
The binary values represent codes
How many different values can be stored in 1 bit?
How many in 2 bits?
How many in 4 bits?
How many in a byte?
General form encoding
If you have x possible unique symbols, and y positions for any one of those symbols, then the general number of unique codes is
xy
Example, you have 2 dice each of which has 6 different face values, so there are 36 or 62 possible unique codes.
ASCII codes represent characters of data Use 1 byte or 8 bits Unicode extends the Ascii codes by another
byte. ASCII can form most of the characters used
by “Western” languages along with punctuation symbols.
Unicode allows for special symbols and symbols in other languages like Japanese, Chinese, Arabic
Figure 8.7. ASCII, The American Standard Code for Information Interchange (page 220)
Reading the chart
Left column is the left side of the byte (group of 8 bits) (another term is the high order)
Right column is the right side of the byte.
Value is the corresponding binary code.
Binary to hex
Hexidecimal (base 16) codes can be used to represent groups of 4 binary digits.
Hexidecimal counting:
0 1 2 3 4 5 6 7 8 9 A B C D E F
A = 10 Binary 1010B = 11 1011C = 12 1100D = 13 1101E = 14 1110F = 15 1111
So the letter Z can be abbreviated
0101 1010 in binary 5 A in hex
Commonly binary numbers are represented in groups of 4 numbers with the leading 0’s used as placeholders.
Hex numbers are shown as 2 digit with a space in between each group of two.
Encoding – character string
Text or character strings are typically contiguously stored in memory.
Assume that each character takes up one byte of space, how many bytes would be required for a phone number (we are using a slightly different example than the book. Note the hyphens and spaces:
568 - 8771
568 – 8771 – requires 10 bytes
5 0011 0101 3516 5310
6 0011 0110 3616 5410
8 0011 1000 3816 5610
0010 0000 2016 3210
- 0010 1101 2E16 4510
0010 0000 2016 3210
8 0011 1000 3816 5610
7 0011 0111 3716 5510
7 0011 0111 3716 5510
1 0011 0001 3116 4910
In class assignment
Using the chart on page 220, what is your first (or nick) name in ASCII binary codes?
Work with your partner. Write the first name (spread out). Write the binary code for each letter of your
name based on the ASCII chart. Convert at least one of those binary codes to
the decimal (base 10) equivalent.
What about other kinds of data?
Chapter 11 material
Pixels
A pixel is like a dot. Your computer screen is composed of thousands of pixels.
How many?
Settings – Control Panel – Display – Settings Screen area is the dimensions expressed in
terms of pixels. Higher the number the better the resolution.
Each pixel
Has a color associated with it. Colors are a combination of red, green, and
blue light – RGB The intensity of the particular color defines
how much of that color contributes to the overall color displayed.
Each color is associated with a 1 byte code. In one byte we can have values from 0 (no color) to 1 (full intensity).
Color
See example in Word document
Black is coded 0 0 0
red green blue White is coded 255 255 255
We will also use this feature when we code HTML colors.
Sound
Analog – real world – infinitely continuous Digital – representation - discrete
Sound is a continuous series of sound waves. To digitize we cannot capture every infinite
value that hits our ears. But we can sample the values.
Figure 11.8. Sound wave. The horizontal axis is time; the vertical axis is sound pressure.
Figure 11.9. Two sampling rates; the rate on the right is twice as fast as that on the left.
Figure 11.11. (a) Three-bit precision for samples requires that the indicated reading be approximated as +10. (b) Adding another bit makes the sample twice as accurate.
Figure 11.10. Schematic for analog-to-digital and digital-to-analog conversion.
Sampling
While we lose some information in this process, it is usually negligible in terms of our ability to perceive the sounds.
But to produce sounds
Requires a large amount of data. For example, at a 16 bit representation of
each sound, it would take 10 megabytes to reproduce 1 minute of a song.
Compression – Remove the parts of the sound that we cannot hear. – MP3 format.
Images
Images have the same problem. If each image is made up of thousands of
pixels, and each pixel requires 3 bytes of data, then each image is huge.
JPEG format compresses the digital representation to remove the differences in hues of a picture that we cannot perceive.
Then we can compress by using run-length compression to code the remaining bits.
Run-length compression
If my bit pattern is:
00000000000000000000000011111111111111000000000000000011111111111111001001
We can code a value to indicate that we have:24 0’s followed by 14 1’s followed by 16 0’s, etc.
When we have many changing values in the pattern, it will not save us much space, but by making patterns of identical pixels, you can save a good deal of data space.
Lossy vs lossless conversion
Lossless – no loss of data in the conversion Lossy – there is loss of data
Run-length coding is lossless. You can convert the original to a compressed form and recover it exactly.
Compression that removes some of the detail (things that we cannot perceive) is lossy. You cannot reproduce exactly the same sound/picture.