6
Retrieving Web Pages (HTTP), Topic 3, Chapter 6 Network Programming Kansas State University at Salina

Retrieving Web Pages (HTTP), Topic 3, Chapter 6

Embed Size (px)

DESCRIPTION

Network Programming Kansas State University at Salina. Retrieving Web Pages (HTTP), Topic 3, Chapter 6. First, some comments. Switch to application protocols Client side focus Pre-build Modules A natural OO thing – a matter of productivity Argh!, someone else’s code - PowerPoint PPT Presentation

Citation preview

Page 1: Retrieving Web Pages (HTTP), Topic 3, Chapter 6

Retrieving Web Pages (HTTP), Topic 3, Chapter 6

Network Programming

Kansas State University at Salina

Page 2: Retrieving Web Pages (HTTP), Topic 3, Chapter 6

First, some comments Switch to application protocols

Client side focus

Pre-build Modules A natural OO thing – a matter of productivity Argh!, someone else’s code Lots of choices, language independent principles

Web related network programming Chapter 6 – retrieving web pages – easy Chapter 7 – Parsing HTML – hard Chapter 8 – XML and XML-RPC – interesting

Page 3: Retrieving Web Pages (HTTP), Topic 3, Chapter 6

HTTP Basics

Stateless, connectionless protocol Basic GET …

import sockets = socket.socket(socket.AF_INET, socket.SOCK_STREAM)s.connect(('www.sal.ksu.edu', 80))request = """GET /faculty/tim/index.html HTTP/1.0\nFrom: [email protected]\nUser-Agent: Python\n\n"""s.send(request)fp = open( "index.html", "w" )while 1: data = s.recv(1024) if not len(data): break fp.write(data)s.close()fp.close()

Page 4: Retrieving Web Pages (HTTP), Topic 3, Chapter 6

Now, for the easy way …

import sys, urllib2

page = "http://www.sal.ksu.edu/faculty/tim/"req = urllib2.Request(page)fd = urllib2.urlopen(req)while 1: data = fd.read(1024) if not len(data): break sys.stdout.write(data)

Page 5: Retrieving Web Pages (HTTP), Topic 3, Chapter 6

Submitting with GET

>>> import urllib

>>> encoding = urllib.urlencode( [('activity', 'water ski'), \ ('lake', 'Milford'), ('code', 52)] )

>>> print encodingactivity=water+ski&lake=Milford&code=52

>>> url = "http://www.example.com" + '?' + encoding

>>> print urlhttp://www.example.com?activity=water+ski&lake=Milford&code=52

Page 6: Retrieving Web Pages (HTTP), Topic 3, Chapter 6

Submitting with POST

>>> encoding = urllib.urlencode( [('activity', 'water ski'),\ ('lake', 'Milford'), ('code', 52)] )

>>> print encodingactivity=water+ski&lake=Milford&code=52

>>> import urllib2

>>> req = urllib2.Request(url)

>>> fd = urllib2.urlopen("http://www.example.com", encoding)