Upload
dophuc
View
228
Download
0
Embed Size (px)
Citation preview
Representação de Caracteres
IFBA – Instituto Federal de Educ. Ciencia e TecBahia
Curso de Analise e Desenvolvimento de Sistemas
Introdução à Ciência da ComputaçãoProf. Msc. Antonio Carlos Souza
Coletânea
York University - ITEC 1011
Introdução
¡ ExemplosReal World
Data
Computer
DataInput device
Dear Mom: Keyboard 10110010…
Digitalcamera
10110010…
Formatos Apropriados
¡ A representação interna deve ser apropriada para o tipo de processamento (texto, imagem e som)
Tipos de Dados
¡ Númerosl Inteiro ou ponto fixol Ponto Flutuantel Número Decimal (BCD)
¡ Caracteresl ACSII (American Standard Code for
Information Interchange)l EBCDIC (Extended binary Coded Decimal
Interchange Code)
¡ Dados Lógicos¡ Endereços
Convenções
¡ Formatos Apropriadosl Unique to a product or companyl E.g., Microsoft Word, Corel Word Perfect, IBM
Lotus Notes
¡ Padrõesl Evolve two ways:
¡ Proprietary formats become de factostandards (e.g., Adobe PostScript, Apple Quick Time)
¡ Committee is struck to solve a problem (Motion Pictures Experts Group, MPEG)
Organizações Padrões
¡ ISO – International Standards Organization
¡ CSA – Canadian Standards Association
¡ ANSI – American National Standards Institute
¡ IEEE – Institute for Electrical and Electronics Engineers
¡ Etc.
Exemplos de Padrões
PostScript, TrueType, PDFOutline graphics/fonts
Sound Blaster, WAV, AUSound
MPEG-2, Quick TimeMotion picture
JPEG, GIF, PCX, TIFFImage
ASCII, EBCDIC, UnicodeAlphanumeric
StandardsType of Data
Por que Padrões?
¡ Padrões são “arbitrary”¡ Eles existem porque são:
l Convenientl Efficientl Flexiblel Appropriatel Etc.
Representação de Caracteres
¡ Em geral, usa-se códigos alfanuméricosl Código de 6 bitsl Código de 7 bits (ASCII)l EBCDICl ASCII estendidol ISO Latin - 1l Caracteres ANSIl Caracteres Unicode
Dados Alfanuméricos
¡ Problema: Distinguir entre o número 123 (one hundred and twenty-three) and the characters “123” (one, two, three)
¡ Quatro padrões para representar letras(alpha) and númerosl BCD – Binary-coded decimall ASCII – American standard code for
information interchangel EBCDIC – Extended binary-coded decimal
interchange codel Unicode
Código de 6 bits¡ Permite representar de 26 = 64 caracteres
¡ 26 letras maiúsculas
¡ 10 algarismos ( 0 1 2 3 4 5 6 7 8 9 )
¡ 28 caracteres especiais, incluindo Space
7 bits (ASCII)
Binary-Coded Decimal (BCD)
¡ 4 bits por dígito
10019
10008
01117
01106
01015
01004
00113
00102
00011
00000
Bit patternDigit
Note: the following bit patterns are not used:
101010111100110111101111
Example
¡ 709310 = ? (in BCD)
7 0 9 3
0111 0000 1001 0011
Next 22 slides
Standard Alphanumeric Formats
¡ BCD¡ ASCII¡ EBCDIC¡ Unicode
The Problem
¡ Representing text strings, such as“Hello, world”, in a computer
Codes and Characters
¡ Each character is coded as a byte¡ Most common coding system is
ASCII (Pronounced ass-key)¡ ASCII = American National
Standard Code for Information Interchange
¡ Defined in ANSI document X3.4-1977
ASCII Features
¡ 7-bit code¡ 8th bit is unused (or used for a
parity bit)¡ 27 = 128 codes¡ Two general types of codes:
l 95 are “Graphic” codes (displayable on a console)
l 33 are “Control” codes (control features of the console or communications channel)
ASCII Chart
000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL
000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL
000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL
Most significant bit
Least significant bit
000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL
e.g., ‘a’ = 1100001
95 Graphic codes
000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL
33 Control codes
000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL
Alphabetic codes
000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL
Numeric codes
000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL
000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL
Punctuation, etc.
“Hello, world” Example
============
Binary010010000110010101101100011011000110111100101100001000000111011101100111011100100110110001100100
Hexadecimal48656C6C6F2C207767726C64
Decimal721011081081114432119103114108100
Hello,
world
============
============
Common Control Codes
¡ CR 0D carriage return¡ LF 0A line feed¡ HT 09 horizontal tab¡ DEL 7F delete¡ NULL 00 null
Hexadecimal code
000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL
000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL
Next 1 slides
Standard Alphanumeric Formats
¡ BCD¡ ASCII¡ EBCDIC¡ Unicode
EBCDIC
¡ Extended BCD Interchange Code (pronounced ebb’-se-dick)
¡ 8-bit code¡ Developed by IBM¡ Rarely used today¡ IBM mainframes only
8 bits (EBCDIC)¡ Extended Binary Coded Decimal
Interchange Code
8 bits (ASCII Estendido)
ISO Latin-1
Caracteres ANSI
¡ Windows 9x suporta caracteres ANSIl American National Standards Institute
¡ Representação de 8 bits (256 caracteres)l 0 a 255
¡ Valores de 0 a 127: mesmos de ASCII¡ Entre 128 a 255: similar a ISO Latin-1
l Tem extensões e incompatibilidades
Next 2 slides
Standard Alphanumeric Formats
¡ BCD¡ ASCII¡ EBCDIC¡ Unicode
Unicode
¡ 16-bit standard¡ Developed by a consortia¡ Intended to supercede older 7- and
8-bit codes
Unicode Version 2.1
¡ 1998¡ Improves on version 2.0 ¡ Includes the Euro sign (20AC16 =
) ¡ From the standard:
…contains 38,887 distinct coded characters derived from the supported scripts. These characters cover the principal written languages of the Americas, Europe, the Middle East, Africa, India, Asia, and Pacifica.
http://www.unicode.org
Caracteres Unicode
¡ Windows NT usa Unicode 16-bitsl Cobre grande parte das línguas vivasl Também linguas mortas (uso escolar)
¡ Detalhesl http://www.unicode.org
Keyboard Input
¡ Key (“scan”) codes are converted to ASCII
¡ ASCII code sent to host computer¡ Received by the host as a “stream”
of data¡ Stored in buffer¡ Processed¡ Etc.
Outras Entradas
¡ OCR – optical character recognition¡ Bar code readers¡ Voice/audio input¡ Punched cards¡ Images / objects¡ Pointing devices