13
Chapter 3 Data Representation Text Characters

Chapter 3 Data Representation Text Characters. 2 Representing Text To represent a text document in digital form, we need to be able to represent every

Embed Size (px)

Citation preview

Page 1: Chapter 3 Data Representation Text Characters. 2 Representing Text To represent a text document in digital form, we need to be able to represent every

Chapter 3Data RepresentationText Characters

Page 2: Chapter 3 Data Representation Text Characters. 2 Representing Text To represent a text document in digital form, we need to be able to represent every

2

Representing Text

To represent a text document in digital form, we need to be able to represent every possible character that may appear.

There is a finite number of characters to represent, so the general approach is to list them all and assign each a binary string.

Page 3: Chapter 3 Data Representation Text Characters. 2 Representing Text To represent a text document in digital form, we need to be able to represent every

3

Representing Text

A character set is a list of characters and the codes used to represent each one.

In 1960, a survey revealed 60 different characters sets in use. At IBM alone there were 9 different sets.

By agreeing to use ONE particular character set, computer manufacturers have made the processing of text data easier.

Page 4: Chapter 3 Data Representation Text Characters. 2 Representing Text To represent a text document in digital form, we need to be able to represent every

4

The ASCII Character Set

ASCII stands for American Standard Code for Information Interchange.

The ASCII character set originally used seven bits to represent each character, allowing for 128 unique characters.

Wikipedia has an excellent entry on ASCII.

Page 5: Chapter 3 Data Representation Text Characters. 2 Representing Text To represent a text document in digital form, we need to be able to represent every

5

The ASCII Character Set (7 bit)

Page 6: Chapter 3 Data Representation Text Characters. 2 Representing Text To represent a text document in digital form, we need to be able to represent every

6

The ASCII Character Set

Notice the organisation of the ASCII table. The table divides in half according to the MSB.

Letters are all in the second half so all codes for alphabetic characters start with 1. This second half of the table divides in half again according to

the next bit: UPPERCASE letters start 10. lowercase letters start 11.

The first half of the table also divides in half according to the next bit: Control characters start 00. Numerals and punctuation start 01.

Page 7: Chapter 3 Data Representation Text Characters. 2 Representing Text To represent a text document in digital form, we need to be able to represent every

7

The ASCII Character Set

Note that control characters (the first 32 in the ASCII character set) do not have simple character representations that you could print to the screen.

Many, however, perform actions with which you are familiar.

Some have there own keys, others need to be constructed.

Page 8: Chapter 3 Data Representation Text Characters. 2 Representing Text To represent a text document in digital form, we need to be able to represent every

8

The ASCII Character SetControl CharactersControl sequences are created by holding the Ctrl key

(control) and pressing a letter.

This has the effect of subtracting 64 from the ASCII value of the letter pressed.

For example: ‘M’ has ASCII value 77

(1001101 in binary), Ctrl-M has ASCII value 13

(0001101 in binary).

Alternately, we can see this as “masking bit 6.”

Page 9: Chapter 3 Data Representation Text Characters. 2 Representing Text To represent a text document in digital form, we need to be able to represent every

9

The ASCII Character SetCommon Control Characters

Hex Binary Decimal Name Function

00 0000000 0 NUL Null

07 0000111 7 BEL Bell

08 0001000 8 BS Backspace

09 0001001 9 HT Horizontal Tab

0A 0001010 10 LF Line Feed

0B 0001011 11 VT Vertical Tab

0C 0001100 12 FF Form Feed

0D 0001101 13 CR Carriage Return

0E 0001110 14 SO Shift Out

0F 0001111 15 SI Shift In

1B 0011011 29 ESC Escape

Page 10: Chapter 3 Data Representation Text Characters. 2 Representing Text To represent a text document in digital form, we need to be able to represent every

10

The ASCII Character Set

Coding letters in ASCII is easy.

Let’s look at ‘j’ as an example:

Since ‘j’ is a letter, its code starts with a 1.

Since it’s lowercase, the next bit is also a 1.

Since it’s the tenth letter of the alphabet the rest of the code is 01010.

The complete ASCII code for ‘j’ is 1101010.

Page 11: Chapter 3 Data Representation Text Characters. 2 Representing Text To represent a text document in digital form, we need to be able to represent every

11

The ASCII Character Set

ASCII evolved so that eight bits are used. The 7-bit codes were simply prefixed with

another bit, giving another natural doubling. The original 7-bit codes were padded with 0.

So the code for ‘j’ became 01101010. 128 new characters were added.

The codes for this alternate character set start with 1.

Page 12: Chapter 3 Data Representation Text Characters. 2 Representing Text To represent a text document in digital form, we need to be able to represent every

12

The Unicode Character Set

Even the extended version of the ASCII character set is not enough for international use.

The Unicode character set uses 16 bits per character. The Unicode character set can represent 216, or over 65 thousand characters.

Unicode was designed to be a superset of ASCII. That is, the first 256 characters in the Unicode character set correspond exactly to the extended ASCII character set.

Page 13: Chapter 3 Data Representation Text Characters. 2 Representing Text To represent a text document in digital form, we need to be able to represent every

13

Examples of Unicode Characters

Figure 3.6 A few characters in the Unicode character set