31
1 URLs and Resources Herng-Yow Chen

URLs and Resources

  • Upload
    ady

  • View
    64

  • Download
    4

Embed Size (px)

DESCRIPTION

URLs and Resources. Herng-Yow Chen. Outline. Navigating the Internet’s Resources URL syntax and what the various URLs mean and do URL Shortcuts that many web clients support: relative URLs and expanded URLs URL encoding and character rules Common URL schemes - PowerPoint PPT Presentation

Citation preview

Page 1: URLs and Resources

1

URLs and Resources

Herng-Yow Chen

Page 2: URLs and Resources

2

Outline Navigating the Internet’s Resources URL syntax

and what the various URLs mean and do URL Shortcuts that many web clients support:

relative URLs and expanded URLs

URL encoding and character rules Common URL schemes The future of URLs, including URNs

Page 3: URLs and Resources

3

Navigating a resource by URL, which tells a web client

Web pagehttp://english.csie.ncnu.edu.tw/demo/index.ht

mlhttp://english.csie.ncnu.edu.tw/demo/index.ht

mlScheme

(how)

Host (where)

Path (what)

1. URL scheme: how to access the resource

2. Server location: where the resource is hosted

3. Resource path: what particular local resource

on the server is being requested

Page 4: URLs and Resources

4

URLs URLs can direct you to resources available throu

gh protocols other than HTTP. Email account:

mailto:[email protected] A file resides on a FTP server:

ftp://ftp.ncnu.edu.tw/a_file.txt A video streamed by a video server:

rtsp://www.cnn.com/headline.rm

Most URLs have the same “scheme://server location/path” structure

Page 5: URLs and Resources

5

Navigating a resource by URL, which tells a web client

Web pagehttp://english.csie.ncnu.edu.tw/demo/index.ht

mlhttp://english.csie.ncnu.edu.tw/demo/index.ht

mlScheme

(how)

Host (where)

Path (what)

1. URL scheme: how to access the resource2. Server location: where the resource is hosted3. Resource path: what particular local resource

on the server is being requested

Page 6: URLs and Resources

6

URL Syntax <scheme>://<user>:<password>@<host>:

<port>/<path>;<params>?<query>#<frag>

Page 7: URLs and Resources

7

Scheme: what protocol to use The scheme is really the main identifier of

how to access a given resource.

The scheme must start with an alphabetic character, and it is separated from the rest of the URL by the first “:” character.

Scheme names are case-insensitive.

Page 8: URLs and Resources

8

Usernames and Passwords Many servers require a username and password

before you can access data through them.

For examples: ftp://ftp.prep.ai.mit.edu/pub/gnu ftp://[email protected]/pub/gnu ftp://anonymous:[email protected]/pub/gnu http://joe:[email protected]/sales_info.txt

The default username and password “anonymous” for username “Internet Explorer sends “IEUser” for password, while

Netscape send “mozilla”.

Page 9: URLs and Resources

9

Hosts and Ports The host component (IP or Domain Name)

identifies that host machine on the Internet that has access to the resource.

The port component identifies the network port on which the server is listing.

Different services uses different default ports for a machine. HTTP: 80 FTP: 21 Telnet: 23 SMTP: 25

Page 10: URLs and Resources

10

Paths The path component of the URL specifies where

on the server machine the resource lives. The path often resembles a hierarchical filesyste

m path. For example: http://www.csie.ncnu.edu.tw/course/1998.html

The path in the URL is “ /course/1998.html”, which resembles a filesystem path on a UNIX filesystem.

The path component for HTTP URLs can be divided into path segments separated by“ /” . Each path segment can have its own params component (described later).

Page 11: URLs and Resources

11

Parameters For many schemes, a simple host and path

to the object just aren’t enough. Aside from what port the server is listening

to and even whether or not you have access to the resource with a username and password, many protocols require more information to work.

For example, ftp://ftp.ncnu.edu.tw/image.gif;type=a ftp://ftp.ncnu.edu.tw/program.exe;type=i

Page 12: URLs and Resources

12

Query strings Some resources, such as database, can

be queried according to input strings. For example:

http://www.xxx.tw/a.cgi?id=123&name=abcid=123&name=abc There is no requirement for the format of

the query component, except that some characters are illegal. By convention, many gateways except the query to be formatted as a series of “name=value” pairs, separated by “&” characters.

Page 13: URLs and Resources

13

Query Strings

http://english.csie.ncnu.edu.tw/course/NWSMLViewer.php?lectureid=rctlee-20030909125212

Internet

“viewer” gateway

lectureid=rctlee-20030909125212

Server

Page 14: URLs and Resources

14

Fragments Some finer resource fragments, such as sessions

in a large HTML document , can friendly be accessed. For example,

http://engquiz.csie.ncnu.edu.tw/e-book/html/B001.html#page10

Because HTTP servers generally deal only with entire objects, not with fragments of objects, clients don’t pass fragments along to servers. Namely, the whole object is retreived, but only the partial content is displayed.

Note that in Range Request feature of HTTP/1.1, agents may request byte ranges of objects. (later lectures)

Page 15: URLs and Resources

15

Fragments

Internet

Client www.csie.ncnu.edu.tw

(a)User selects link to “http://www.csie.ncnu.edu.tw/~hychen/web_tech/#Resource”

(Fragment is NOT sent to the server)

(b)Browser makes request to http://www.csie.ncnu.edu.tw/~hychen/web_tech/

(c)Server returns entire HTML page

(d)Browser displays HTML page starting with named ”Resource”fragment

Browser scrolls down to star at named “Resource” fragment

Page 16: URLs and Resources

16

URL shortcuts Web clients understand and use a few URL

shortcuts. Many browsers also support automatic

expansion of URLs, where the user can type in a key (memorable) part of a URL, and the browser fills in the rest.

Relative URLs Base URLs Resolving relative references Expanded URLs

Page 17: URLs and Resources

17

Relative URLs URLs comes in two flavors: absolute and

relative. So far, we have looked only at absolute

URLs, all the information you need to access a resource.

On the other hand, relative URL is incomplete. To get all the information need to access a resource, a relative URL must be interpreted on the basis of another URL, called its base.

Page 18: URLs and Resources

18

HTML snippet with relative URL

<HTML><HEAD> <TITLE> Joe’s Tools </TITLE> </HEAD><BODY><H1> Tools page </H1><H2> Hammers </H2><P> Joe’s HARDWARE online has the largest sele

ction of <A href= “ ./hammers.html”> hammers </A> on earth.

</BODY></HTML>

Page 19: URLs and Resources

19

Using a base URL

Base URL:

http://www.joes-hardware.com/tools.html

Relative URL:

./hammers.html

http://www.joes-hardware.com/hammers.htmlNew absolute URL

Page 20: URLs and Resources

20

Base URLs The first step in the conversion process is to find

a base URL, which can come from a few places. Explicitly provided in the resource

Use <BASE> tag to define the base URL Base URL of the encapsulating resource

Does not explicitly specify a base URL. Use the URL of the resource in which the document is

imbedded as a base, as the example in the preceding slide. No base URL

In some instances, there is no base URL. This often means that you have an absolute URL; however, sometimes you just have an incomplete or broken URL.

Page 21: URLs and Resources

21

Resolving relative references

Page 22: URLs and Resources

22

Expanded URLs Some browser try to expand URLs automat

ically, either after you submit the URL or while you’re typing. This provides users with a shortcut: they don’t have to type in the complete URL. Hostname expansion

Ex: yahoo www.yahoo.com History expansion

Ex: http://www.ncnu http://www.ncnu.edu.tw

Page 23: URLs and Resources

23

Shady characters in URLs URLs were designed to be portable, to uniformly

name all the resources on the Internet. This means that the URLs will be transmitted through various protocol.

Because different protocols (schemes) use different mechanisms for transmitting, it is important for the URLs to be transmitted safely, namely without losing information, through any protocols over network.

Some protocols, such as the Simple Mail Transfer Protocol (SMTP) for email, use a 7-bit encoding for message; this can strip off certain characters if the source is encoded in 8 bits or more.

Page 24: URLs and Resources

24

Shady characters in URLs URLs are permitted to contain only characters fro

m a relatively small, universally safe alphabet. In addition to the transportable issue, URLs shoul

d be readable. Hence, some invisible, nonprinting characters also are prohibited in URLs, even though these character may pass through mailers.

To complete matter further, URLs also need to be complete. One day people would want URLs to contain binary data or characters outside of the universally safe of alphabets. So, an escape mechanism was added.

Page 25: URLs and Resources

25

The URL Character Set US-ASCII is very portable, due to its long legacy.

It uses 7 bits to represent most keys available on an English typewriter and a few non-printing control character for text formatting and hardware signal. But it doesn’t support the inflected characters common in European languages or non-Romanic language read.

Want to contain arbitrary binary data. Use escape sequences allow the encoding of

arbitrary values using restricted subset of the US-ASCII character set, yielding portability and completeness.

Page 26: URLs and Resources

26

Encoding mechanism Simply represents the unsafe character by

an “escape” notation, consisting of a percent sign (%) followed by two hexadecimal digits.

For example ~ 0x7E, http://www.ncnu.edu.tw/%7Ehychen Space 0x20, http://www.abc.com/web%20tools.html % 0x25, http://www.abc.com/100%25satisfaction.html

Page 27: URLs and Resources

27

Character Restrictions % escape token / path delimiter . Path component .. Path component # fragment delimiter ? Query-string delimiter ; params delimiter : to delimit the scheme, user/password, and host/p

ort $,+ Reserved @&= Reserved - special meaning in some scheme {}|\^~[]’ Restricted unsafe handling by various transport

agent, such as gateway <>” Unsafe; should be encoded have meaning outs

ide the scope of URL 0x00-0x1F, 0x7F Restricted fall within nonprintable range >0x7F Restricted fall within this range do not fall within 7-bit range of US-ASCII

Page 28: URLs and Resources

28

Common scheme format http, https mailto ftp rtsp, rtspu file News telnet

Page 29: URLs and Resources

29

The Future: URN?

Internet

Internet

STEP1:Ask the resource resolver what the Joe’s Hardware URL is. Receive from the resolver the current location of the resource

Client

Client

Purl.oclc.org

Get http://purl.oclc.org/jhardware/

Actual:http://www.joes-hardware.com/STEP2: Get the actual URL for the resource Get http://www.joes-hardware.com

www.joes-hardware.com

Page 30: URLs and Resources

30

URIUniversal Resource Identifier

URIs defined in RFC 1630. (1994) URI is a superset of URL and URN.

Full URI: proto://hostname/pathhttp://www.csie.ncnu.edu.tw:80/~hychen/

Partial URI: /path/~hychen/

Identifies the Server

No server mentioned

Page 31: URLs and Resources

31

URLs information http://www.w3.org/Addressing/

The W3C page about naming and addressing URIs and URLs. http://www.ietf.org/rfc/rfc1738.txt

RFC 1738, “Uniform Resource Locators (URL),” by T. Berners-Lee, L. Masinter, and M. McCahill.

http://www.ietf.org/rfc/rfc2396.txt RFC 2396, “Uniform Resource Identifiers (URI): Generic Syntax,” by T. B

erners-Lee, R. Fielding, and L. Masinter. http://www.ietf.org/rfc/rfc2141.txt

RFC 2141, “URN Syntax,” by R. Moats. http://purl.oclc.org

The persistent uniform resource locator web site. http://www.ietf.org/rfc/rfc1808.txt

RFC 1808, “Relative Uniform Resource Locators,” by R. Fielding.