CIT 383: Administrative Scripting Slide #1
CIT 383: Administrative Scripting
HTTP and HTML
CIT 383: Administrative Scripting
Topics
1. HTTP
2. URLs
3. Cookies
4. Base64
CIT 383: Administrative Scripting
Web Client/Server Interaction
HTTP Request (form submission)
HTTP Response (new web page)Server processingUser waits
HTTP Request (form submission)User interaction
HTTP Response (new web page)User waits Server processing
Browser Server
CIT 383: Administrative Scripting Slide #4
HTTP: HyperText Transfer Protocol
Simple request/respond protocol– Request methods: GET, POST, HEAD, etc.– Protocol versions: 1.0, 1.1
Stateless– Each request independent of previous requests,
i.e. request #2 doesn’t know you auth’d in #1.– Applications responsible for handling state.
CIT 383: Administrative Scripting Slide #5
HTTP Request
GET http://www.google.com/ HTTP/1.1Host: www.google.comUser-Agent: Mozilla/5.0 (Windows NT 5.1) Gecko/20060909 Firefox/1.5.0.7
Accept: text/html, image/png, */*Accept-Language: en-us,en;q=0.5Cookie: rememberme=true; PREF=ID=21039ab4bbc49153:FF=4
Method URL Protocol Version
Headers
Blank Line
No Data for GET method
CIT 383: Administrative Scripting Slide #6
HTTP Response
HTTP/1.1 200 OK
Cache-Control: private
Content-Type: text/html
Server: GWS/2.1
Date: Fri, 13 Oct 2006 03:16:30 GMT
<HTML> ... (page data) ... </HTML>
Protocol Version HTTP Response Code
Headers
BlankLine
Web Page Data
CIT 383: Administrative Scripting
HTTP MethodsHEAD
Same as GET, but only asks for headers, not body.
GETRequests a representation of the resource. Most common method. Should not
cause server to modify (write, delete) any resources.
POSTSubmits data to be processed to the resource. The data is included in the body
of the request. This may result in the creation of a new resource or the updates of existing resources or both.
PUTUploads a representation of the specified resource.
DELETEDeletes the specified resource.
TRACEEchoes back the received request, so that a client can see what intermediate
servers are adding or changing in the request.
CIT 383: Administrative Scripting
HTTP Request HeadersHeader Description Example
Accept Acceptable content types. Accept: text/plain
Authorization HTTP authentication credentials.
Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==
Cache-Control Caching directives Cache-Control: no cache
Cookie Cookie data for server. Cookie: color=red
Date Date and time sent Date: 29 Oct 2008 1:02:03
Host Name of server Host: cs.nku.edu
If-Modified-Since
Allows a 304 Not Modified to be returned for caching.
If-Modified-Since: 29 Oct 2008 1:02:03 GMT
User-Agent Browser description string Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.2) Ubuntu/8.04 Firefox/3.1
CIT 383: Administrative Scripting
HTTP Response Headers
Header Description Example
Cache-Control Caching directives Cache-Control: no cache
Content-Encoding
Type of encoding used. Content-Encoding: gzipSer
Content-Length Length of data returned. Content-Length: 1024
Content-Type Type of data returned. Content-Type: text/html
Date Date and time response sent. Date: 29 Oct 2008 1:02:03
Expires Date after which data expired. Expires: 1 Nov 2008 1:02:03
Location Used in redirection Location: http://www.example.com/about/
Server Server identification string. Server: Apache/2.0.55
Set-Cookie Cookie created by server. Set-Cookie: color=red
CIT 383: Administrative Scripting
HTTP Response CodesCode Description Meaning
200 OK Standard success response.
201 Created New resource created.
301 Moved permanently Permanent redirect to new URI.
304 Not modified Safe to use page stored in cache.
307 Temporary redirect Use new URI now; try old later.
401 Unauthorized Authentication failed.
403 Forbidden Disallowed, auth will not help.
404 Not found Resource was not found.
405 Method not allowed Used GET when should use POST.
500 Internal server error Internal server error.
CIT 383: Administrative Scripting
Net::HTTP Class
Net::HTTP.get(host, path): returns resource from host, path as a string.
Net::HTTP.get_response(host, path): returns HTTP response object, includes body + headers.
Net::HTTP.post_form(host, path,{parameters}): returns resource from host, path as a string using POST instead of GET, sending form parameters as a hash.
CIT 383: Administrative Scripting
Redirection Example
def fetch(uri)response = Net::HTTP.get_response(uri)case response
when Net::HTTPSuccess then response when Net::HTTPRedirection then fetch(response['location']) else response.error! end
endend
CIT 383: Administrative Scripting
URI Format
<proto>://<user>@<host>:<port>/<path>?<qstr>– Whitespace marks end of URL– “@” separates userinfo from host– “?” marks beginning of query string– “&” separates query parameters– %HH represents character with hex values– ex: %20 represents a space
http://username:[email protected]:8001/a%20spaced%20path
CIT 383: Administrative Scripting
URI Class
URI.extract(string): returns array of URI strings extracted from string.
URI.extract("text http://example.com/ and mailto:[email protected] and text here also.")
=> ["http://example.com/", "mailto:[email protected]"]
URI.join(string,string,...): joins two or more strings into a URI.
URI.parse(string): creates URI object f/ string.
URI.split(uri): splits URI string into protocol, host, path, query, etc. components.
CIT 383: Administrative Scripting Slide #15
Cookies
Server to ClientContent-type: text/html
Set-Cookie: foo=bar; path=/; expires Fri, 20-Feb-2004 23:59:00 GMT
Client to ServerContent-type: text/html
Cookie: foo=bar
CIT 383: Administrative Scripting
Base64 Encoding
How do you send binary data using text?– Email attachments (MIME).– Cookies (HTTP).
Base64: encode 3 bytes as 4 text characters– Use characters A-Za-z0-9+/ to store 6 bits of data.– Byte has 8 bits, so 3 bytes = 24 bits– 4 base64 chars (6 bits each) = 24 bits– Use = to pad output if input not multiple of 3 bytes.
CIT 383: Administrative Scripting
Base64 Class
encode = Base64.encode64(‘informatics‘)
decode = Base64.decode64(‘aW5mb3JtYXRpY3M=‘)
CIT 383: Administrative Scripting
Topics
1. Evolution of HTML
2. HTML Structure
3. Regular Expressions v Parsing
4. HPricot
5. XPath
CIT 383: Administrative Scripting
Evolution of HTML
1991 HTML created (only 22 tags)
1995 HTML 2.0
1996 Tables added to HTML 2.0
Jan 1997 HTML 3.2 published by W3C
Dec 1997 HTML 4.0
2000 XHTML 1.0
2008 HTML 5.0 working draft published.
CIT 383: Administrative Scripting
HTML Structure
<html>
<title>My title</title>
<body>
<a href=“...”>My link</a>
<h1>My header</h1>
</body>
</html>
CIT 383: Administrative Scripting
HTML Structure
CIT 383: Administrative Scripting
Why Not Regular Expressions?
Angle-bracket tags are difficult to deal with.Tag regexp: <\w+\s+[^>]*>
Matches <img alt=“ruby” src=“rb.png”>
Doesn’t: <img alt=“ruby>” src=“rb.png”>
Solution:check for > in attributes.
Have to match every form of attributename=“value”
name=‘value’
name=value
name
CIT 383: Administrative Scripting
Hpricot
h = Hpricot(html-string)Creates a new HPricot::Doc object.
el = h.at(string)Finds first matching Hpricot::Elements object.
el = h.search(string or XPath expression)Returns array of matching objects.
el.inner_htmlReturns HTML enclosed in element.
CIT 383: Administrative Scripting
XPath Searches
h.search("p")Find all paragraph tags in document.
doc.search("/html/body//p")Find all paragraph tags within the body tag.
doc.search("//a[@src]") Find all anchor tags with a src attribute.
doc.search("//a[@src='google.com']") Find all a tags with a src attribute of google.com.
Final Exam
Comprehensive exam like midterm– 20% concepts (focus on classes + exceptions)– 80% programs (at least 2 programs like labs)
Study– Review the midterm practice problems.– Work out your lab programs again.– Solve un-assigned lab programs.– Review concepts, esp. classes + exceptions.
CIT 383: Administrative Scripting
Going Further
Ruby Quiz– Assignment-scale problems + solutions.
– http://rubyquiz.com/
Practical Ruby for System Administration– If Admin Scripting II existed, this would be the text.
General Ruby Books– The Ruby Way, 2nd edition
– The Ruby Programming Language
CIT 383: Administrative Scripting
CIT 383: Administrative Scripting Slide #27
References1. Michael Fitzgerald, Learning Ruby, O’Reilly,
2008.2. David Flanagan and Yukihiro Matsumoto, The
Ruby Programming Language, O’Reilly, 2008.3. Hal Fulton, The Ruby Way, 2nd edition, Addison-
Wesley, 2007.4. Robert C. Martin, Clean Code, Prentice Hall,
2008.5. Dave Thomas with Chad Fowler and Andy Hunt,
Programming Ruby, 2nd edition, Pragmatic Programmers, 2005.