Protocolo HTTP
Protocolo HTTP Let us start with this quote from the HTTP
specification document [2]: The HTTP protocol is based on a request /response
paradigm. A client establishes a connection with a server and sends a request to the server in the form of a request method, URI, and protocol version, followed by a MIME-like message containing request modifiers, client information, and possible body content. The server responds with a status line, including the message's protocol version and a success or error code, followed by a MIME-like message containing server information, entity meta-information, and possible body content.
Protocolo HTTP (2) What this means to libwww-perl is that
communication always take place through these steps: First a request object is created and configured.
This object is then passed to a server and we get a response object in return that we can examine.
A request is always independent of any previous requests, i.e. the service is stateless (sem estado). The same simple model is used for any kind of service we want to access.
Exemplos1) if we want to fetch a document from a remote
file server, then we send it a request that contains a name for that document and the response will contain the document itself.
2) If we access a search engine, then the content of the request will contain the query parameters and the response will contain the query result.
3) If we want to send a mail message to somebody then we send a request object which contains our message to the mail server and the response object will contain an acknowledgment that tells us that the message has been accepted and will be forwarded to the recipient(s).
O objeto RequestThe libwww-perl request object has the class name HTTP::Request. The
fact that the class name uses HTTP:: as a prefix only implies that we use the HTTP model of communication. It does not limit the kind of services we can try to pass this request to. For instance, we will send HTTP::Requests both to ftp and gopher servers, as well as to the local file system.
The main attributes of the request objects are: The method is a short string that tells what kind of request this is. The
most common methods are GET, PUT, POST and HEAD. The uri is a string denoting the protocol, server and the name of the
"document" we want to access. The uri might also encode various other parameters.
The headers contain additional information about the request and can also used to describe the content. The headers are a set of keyword/value pairs.
The content is an arbitrary amount of data.
O objeto ResponseThe libwww-perl response object has the class name HTTP::Response. The
main attributes of objects of this class are:
The code is a numerical value that indicates the overall outcome of the request.
The message is a short, human readable string that corresponds to the code.
The headers contain additional information about the response and describe the content.
The content is an arbitrary amount of data.
Since we don't want to handle all possible code values directly in our programs, a libwww-perl response object has methods that can be used to query what kind of response this is. The most commonly used response classification methods are:
is_success() The request was was successfully received, understood or accepted.
is_error() The request failed. The server or the resource might not be available, access to the
resource might be denied or other things might have failed for some reason.
O User Agent (UA)Let us assume that we have created a request object.
What do we actually do with it in order to receive a response?
The answer is that you pass it to a user agent object and this object takes care of all the things that need to be done (like low-level communication and error handling) and returns a response object. The user agent represents your application on the network and provides you with an interface that can accept requests and return responses.
The user agent is an interface layer between your application code and the network. Through this interface you are able to access the various servers on the network.
User AgentThe class name for the user agent is
LWP::UserAgent.
Every libwww-perl application that wants to communicate should create at least one object of this class. The main method provided by this object is request(). This method takes an HTTP::Request object as argument and (eventually) returns a HTTP::Response object.
The user agent has many other attributes that let you configure how it will interact with the network and with your application.
The timeout specifies how much time we give remote servers to respond before the library disconnects and creates an internal timeout response.
The agent specifies the name that your application should use when it presents itself on the network.
The from attribute can be set to the e-mail address of the person responsible for running the application. If this is set, then the address will be sent to the servers with every request.
The parse_head specifies whether we should initialize response headers from the <head> section of HTML documents.
The proxy and no_proxy attributes specify if and when to go through a proxy server. URL:http://www.w3.org/pub/WWW/Proxies/
The credentials provide a way to set up user names and passwords needed to access certain services.
Many applications want even more control over how they interact with the network and they get this by sub-classing LWP::UserAgent. The library includes a sub-class, LWP::RobotUA, for robot applications
This example shows how the user agent, a request and a response are represented in actual perl code:
# Create a user agent object use LWP::UserAgent; $ua = LWP::UserAgent->new; $ua->agent("MyApp/0.1 "); # Create a request my $req = HTTP::Request->new(POST =>
'http://search.cpan.org/search'); $req->content_type('application/x-www-form-urlencoded');
$req->content('query=libwww-perl&mode=dist'); # Pass request to the user agent and get a response back my
$res = $ua->request($req); # Check the outcome of the response if ($res->is_success) { print $res->content; } else { print
$res->status_line, "\n"; } The $ua is created once when the application starts up. New
request objects should normally created for each request sent.
Capítulo 1 - Introdução
Web Client (Cliente Web)
Cliente Web: é uma aplicação que comunica-se com um servidor Web usando o protocolo HTTP
Cliente Web (2)
A interface mais comum a WWW é o navegador (browser)
web browser permite que você faça o download de documentos web e veja-os formatados na tela
URL (Universal Resource Locator)
É um subconjunto da URI (Universal Resource Identifier, ou Identificador de Recursos Universal)
HTTP (Hypertext Transport Protocol)
Common Gateway Interface (CGI)
Capítulo 2 – Desmistificando o Browser
Transação HTTP
programa web cliente web servidor web o protocolo HTTP é baseado em
texto, isto é, podemos ver os comandos sendo trocados
transação web
A requisição através do browser
http://hypothetical.ora.com/ http:// protocolo usado hypothetical.ora.com servidor / diretório no servidor
A requisição do cliente
GET / HTTP/1.0Connection: Keep-AliveUser-Agent: Mozilla/3.0Gold (WinNT;
I)Host: hyphotetical.ora.comAccept: image/gif, image/x-xbitmap,
*/*
A resposta do servidorHTTP/1.0 200 OKDate: Fri, 04 Oct 1996 14:31:51 GMTServer: Apache/1.1.1.Content-type: text-htmlContent-length: 327Last-modified: Fri, 04 Oct 1996 14:06:11 GMT
<title>...</title>
responseheader
body orbody orentity-body
Transação HTML
Cliente Servidor
HTML (Hypertext Markup Language)
Transações
Método POSTPOST /cgi-bin/query HTTP/1.0Referer:Connection:User-Agent:Host:Accept:Content-type: application/x-www-form-urlencodedContent-length: 47
querytype=subject&queryconst=numerical+analysis
Tipos de métodos de requisição
GET POST
Método PUTPUT /example.html HTTP/1.0Connection:User-Agent:Pragma:Host:Accept:Content-Length:
<!
</HTML>
Estrutura de uma transação HTTP
Requisição do Cliente
Method URI HTTP-version
General-header
Request-header
Entity-header
Entity-body
Resposta do Servidor
HTTP-version Status-code Reason-phrase
General-header
Response-header
Entity-header
Entity-body
Estrutura de uma requisição do cliente
Estrutura de uma resposta do Servidor
Capítulo 3 – Aprendendo HTTP
HTTP é um protocolo stateless no qual o cliente faz uma requisição (request) ao servidor que envia uma resposta (response) e então a transação é finalizada
Métodos de Requisição do Cliente
O método de requisição do cliente é um “comando” ou uma requisição que o cliente web faz ao servidor
Métodos: GET, POST, HEAD, DELETE, TRACE, PUT
GET: obtendo um Documento
HEAD: Obtendo a informação do cabeçalho
POST: Enviando dados ao servidor
PUT: Armazenando o Entity-Body na URL
DELETE: Removendo a URL
TRACE: View the Client’s Message Through the Request Chain
Versões do HTTP HTTP 1.0 HTTP 1.1
melhor implementação de conexões persistentes
Multihoming (permite um único host, porém respondendo por vários domínios diferentes)
entity tags byte ranges – permite que apenas partes do
documento sejam recuperadas digest authentication
Códigos de Resposta do Servidor
Faixa de valores
Significado da Resposta
100-199 Informacional
200-299 Requisição do cliente foi feita com sucesso
300-399 A requisição do cliente foi redirecionada. Outras alterações são necessárias
400-499 Requisição do cliente está incompleta
500-599 Erros do servidor
Cabeçalhos HTTP
Diferentes tipos de cabeçalhos
General headers Request headers Response headers Entity Headers
Conexões Persistentes
Connection: Keep-Alive
Tipos de mídia
Accept header Content-Type Exemplos:
Accept: */* Accept: type/* Accept: type/subtype
Caching de Cliente
Obtendo o tamanho do Conteúdo
cabeçalho Content-length
Faixa de Bytes (Byte ranges)
Referring Documents
Referer header
Identificação de Cliente e Servidor
Autorização An Authorization header is used to
request restricted documents Authorization: SCHEME REALMExemplo:Authorization: BASIC username:password,
onde username:password é codificado em base64
Autenticação
The realm of the BASIC authentication schema indicates the type of authentication requested
See also Digest authentication (disponível em HTTP 1.1)
Cookies
Set-Cookie e cabeçalhos Cookie
Capítulo 4 – A Biblioteca Socket
The socket library is a low-level programmer’s interface that allows client to set up a TCP/IP connection and communicate directly to servers. Servers use sockets to listen for incoming connections, and clients use sockets to initiate transactions on the port that the server is listening to.
Uma conversação típica usando Sockets
Socket Calls
socket()
bind()
listen
accept()
sysread()syswrite()
sysread()
close()
socket()
connect()
syswrite()
close()
Rotinas do Cliente Rotinas do Servidor
Usando chamadas de SocketFunção Uso Proposta
socket()
connect()
sysread()
syswrite()
close()
bind()
listen()
accept()
Capítulo 5 – A biblioteca LWP
A Web trabalha sobre o protocolo TCP/IP, onde o cliente e o servidor estabelecem uma conexão e trocam as informações necessárias através dessa conexão
Apêndice A – Cabeçalhos HTTP
Há quatro categorias de cabeçalhos: General Request Response Entity
Summary if Support Across HTTP Versions
HTTP 0.9 HTTP 1.0 HTTP 1.1
Apêndice B – Tabelas de Referência
Media Types Character Encoding Languages Character Sets
Tipos de Mídias
Content-type header Accept header Internet Media Types
Text Type/Subtype
text/plan text/richtext text/enriched text/tab-separetae-values text/html text/sgml
Multipart Type/Subtype
Message Type/Subtype
Application Type/Subtype
Codificação de Caracteres
Content-type of applicatrion/x-www-form-urlencoded
caracteres especiais são codificados para eliminar a ambiguidade
Veja RFC 1738 (http://www.faqs.org/rfcs/rfc1738.html)
Linguagens
A language tag is of the form of:<primary-tag> <-subtag>where zero or more subtags are
allowedSee RFC 1766 for more information
Conjunto de Caracteres
Accepted-language Content-language Veja RFC 1700 (
http://www.faqs.org/rfcs/rfc1700.html)
Bibliografia
WONG, C. Web Client Programming with Perl. 1st Edition March 1997. O’Reilly
[2] URL:http://www.w3.org/pub/WWW/Protocols/
Glossário
IANA – Internet Assigned Number Authority
CGI – Common Gateway Interface
Backup Slides
HTTP é stateless
O HTTP é um protocolo stateless (sem-estado) não existe uma conexão permanente entre o servidor e o cliente (navegador) portanto o servidor não sabe se uma conexão seguinte está relacionada a conexão anterior
Protocolo HTTP
Request HTTP (requisição) Response HTTP (resposta) Corpo de uma requisição HTTP
Cookies
São informações armazenadas no computador do usuário que são opcionalmente enviadas em cada requisição pelo navegador, processado pelo servidor e recebido de volta na resposta
Container Web