Data Representation CS280 – 09/13/05. Binary (from a Hacker’s dictionary) A base-2 numbering system with only two digits, 0 and 1, which is perfectly

Data Representation

CS280 – 09/13/05

Binary (from a Hacker’s dictionary)

A base-2 numbering system with only two digits, 0 and 1, which is perfectly suited for electronic operations since it can be expressed by power states (on/off), voltage levels (high/low) or charge (positive/negative), but is less than ideal for humans, who find it awkward to say things like “It’s a Catch – 10110 situation,” “He’s the 11011010-pound gorilla,” and “That’s the 10110010011011001-dollar question.”

“There are only 10 kinds of people in the world…those that understand binary and those that do not.” (from the ACM CS t-shirt).

Looking at more data

Character representations

Data and Data Representation

So how is this data that we operate on stored in the computer?

Let’s start with numbers

Binary codes are Base 2 We “think” and operate in Base 10.

What does this mean?

Counting

Base 10 has 10 digits to represent different numbers of things

Base 2 has only 2 digits available.

Counting

Base 10

0 1 2 3 4 5 6 7 89

then we run out of unique digits.

So we move to a positional system.

10 – means we have ten things – a 1 in the 10’s position and no more things in the 1’s position.

Binary counting

0 1 then we run out of digits

10

This represents the number 2. A 1 in the 2’s position and a 0 in the 1’s position.

Positional notation

10’s positions represented

1 * 100 = 1 1 * 101 = 10 1 * 102 = 100 1 * 103 = 1,000 1 * 104 = 10,000 1 * 105 = 100,000 1 * 106 = 1,000,000

2’s positions represented

1 * 20 = 1 1 * 21 = 2 1 * 22 = 4 1 * 23 = 8 1 * 24 = 16 1 * 25 = 32 1 * 26 = 64

How does the computer then store numbers?

Let’s say we want to represent the number 53 in binary.

5310 = 110101

Why?

See chart next page.

Converting from binary to decimalUse chart

1 * 20 = 1 1 * 21 = 2 1 * 22 = 4 1 * 23 = 8 1 * 24 = 16 1 * 25 = 32 1 * 26 = 64 1 * 27 = 128 1 * 28 = 256

75 decimalMust use 7 bits

xxxxxxx

75 – 64 = 9

1xxxxxx

32 and 16 are not used

100xxxx

9 – 8 = 1

1001xxx

4 and 2 are not used last digit is 1

1001001

What is the general subtractionalgorithm to convert from binary to decimal number?

Converting binary to decimal

274 – decimal

4 * 100 = 4 7 * 101 = 70 2 * 102 = 200

total 274

1011 – binary

1 * 20 = 1 1 * 21 = 2 0 * 22 = 0 1 * 23 = 8

total 11

What is the general algorithm for taking a number in base X and converting it to its base 2 equivalent?

Number representation

Numbers are represented by their corresponding binary representation We are disregarding sign We are disregarding floating point

What about other kinds of data?

Think about the binary values as a kind of code.

The binary values represent codes

How many different values can be stored in 1 bit?

How many in 2 bits?

How many in 4 bits?

How many in a byte?

General form encoding

If you have x possible unique symbols, and y positions for any one of those symbols, then the general number of unique codes is

xy

Example, you have 2 dice each of which has 6 different face values, so there are 36 or 62 possible unique codes.

ASCII codes represent characters of data Use 1 byte or 8 bits Unicode extends the Ascii codes by another

byte. ASCII can form most of the characters used

by “Western” languages along with punctuation symbols.

Unicode allows for special symbols and symbols in other languages like Japanese, Chinese, Arabic

Figure 8.7. ASCII, The American Standard Code for Information Interchange (page 220)

Reading the chart

Left column is the left side of the byte (group of 8 bits) (another term is the high order)

Right column is the right side of the byte.

Value is the corresponding binary code.

Binary to hex

Hexidecimal (base 16) codes can be used to represent groups of 4 binary digits.

Hexidecimal counting:

0 1 2 3 4 5 6 7 8 9 A B C D E F

A = 10 Binary 1010B = 11 1011C = 12 1100D = 13 1101E = 14 1110F = 15 1111

So the letter Z can be abbreviated

0101 1010 in binary 5 A in hex

Commonly binary numbers are represented in groups of 4 numbers with the leading 0’s used as placeholders.

Hex numbers are shown as 2 digit with a space in between each group of two.

Encoding – character string

Text or character strings are typically contiguously stored in memory.

Assume that each character takes up one byte of space, how many bytes would be required for a phone number (we are using a slightly different example than the book. Note the hyphens and spaces:

568 - 8771

568 – 8771 – requires 10 bytes

5 0011 0101 3516 5310

6 0011 0110 3616 5410

8 0011 1000 3816 5610

0010 0000 2016 3210

- 0010 1101 2E16 4510

0010 0000 2016 3210

8 0011 1000 3816 5610

7 0011 0111 3716 5510

7 0011 0111 3716 5510

1 0011 0001 3116 4910

In class assignment

Using the chart on page 220, what is your first (or nick) name in ASCII binary codes?

Work with your partner. Write the first name (spread out). Write the binary code for each letter of your

name based on the ASCII chart. Convert at least one of those binary codes to

the decimal (base 10) equivalent.

What about other kinds of data?

Chapter 11 material

Pixels

A pixel is like a dot. Your computer screen is composed of thousands of pixels.

How many?

Settings – Control Panel – Display – Settings Screen area is the dimensions expressed in

terms of pixels. Higher the number the better the resolution.

Each pixel

Has a color associated with it. Colors are a combination of red, green, and

blue light – RGB The intensity of the particular color defines

how much of that color contributes to the overall color displayed.

Each color is associated with a 1 byte code. In one byte we can have values from 0 (no color) to 1 (full intensity).

Color

See example in Word document

Black is coded 0 0 0

red green blue White is coded 255 255 255

We will also use this feature when we code HTML colors.

Sound

Analog – real world – infinitely continuous Digital – representation - discrete

Sound is a continuous series of sound waves. To digitize we cannot capture every infinite

value that hits our ears. But we can sample the values.

Figure 11.8. Sound wave. The horizontal axis is time; the vertical axis is sound pressure.

Figure 11.9. Two sampling rates; the rate on the right is twice as fast as that on the left.

Figure 11.11. (a) Three-bit precision for samples requires that the indicated reading be approximated as +10. (b) Adding another bit makes the sample twice as accurate.

Figure 11.10. Schematic for analog-to-digital and digital-to-analog conversion.

Sampling

While we lose some information in this process, it is usually negligible in terms of our ability to perceive the sounds.

But to produce sounds

Requires a large amount of data. For example, at a 16 bit representation of

each sound, it would take 10 megabytes to reproduce 1 minute of a song.

Compression – Remove the parts of the sound that we cannot hear. – MP3 format.

Images

Images have the same problem. If each image is made up of thousands of

pixels, and each pixel requires 3 bytes of data, then each image is huge.

JPEG format compresses the digital representation to remove the differences in hues of a picture that we cannot perceive.

Then we can compress by using run-length compression to code the remaining bits.

Run-length compression

If my bit pattern is:

00000000000000000000000011111111111111000000000000000011111111111111001001

We can code a value to indicate that we have:24 0’s followed by 14 1’s followed by 16 0’s, etc.

When we have many changing values in the pattern, it will not save us much space, but by making patterns of identical pixels, you can save a good deal of data space.

Lossy vs lossless conversion

Lossless – no loss of data in the conversion Lossy – there is loss of data

Run-length coding is lossless. You can convert the original to a compressed form and recover it exactly.

Compression that removes some of the detail (things that we cannot perceive) is lossy. You cannot reproduce exactly the same sound/picture.

Documents

Data Representation CS280 – 09/13/05. Binary (from a Hacker’s dictionary) A base-2 numbering system with only two digits, 0 and 1, which is perfectly