37
Internet / Intranet Fall 2000 Class 4 Web Server Technology HTTP Protocol Log Files

Internet / Intranet Fall 2000

  • Upload
    wilmer

  • View
    29

  • Download
    0

Embed Size (px)

DESCRIPTION

Internet / Intranet Fall 2000. Class 4 Web Server Technology HTTP Protocol Log Files. Class 4 Agenda. Discuss Homework Milestone 2 Due Week 6 Mini-Homework Due Next Week Overview of Web Servers and Server Technology Presentations HTTP - PowerPoint PPT Presentation

Citation preview

Page 1: Internet / Intranet Fall 2000

Internet / Intranet

Fall 2000

Class 4Web Server Technology

HTTP ProtocolLog Files

Page 2: Internet / Intranet Fall 2000

Brandeis University Internet/Intranet Spring 2000 2

Class 4 Agenda

Discuss HomeworkMilestone 2 Due Week 6Mini-Homework Due Next Week

Overview of Web Servers and Server TechnologyPresentationsHTTP

The Protocol For Communication Between Web Browser and ServerLog Files

Lab WorkHTTP Log Files (Mini-Homework)

Page 3: Internet / Intranet Fall 2000

Brandeis University Internet/Intranet Spring 2000 3

Web Servers

A Basic Web Server is Just a File ServerClient Requests a File via HTTP ProtocolServer Delivers the File via HTTP ProtocolServer Maps URL to a SubdirectoryWeb Server Needs Appropriate Permissions to Access Files/DirectoriesSupports Non-HTTP Protocols

FTP, Gopher, etc.

A Web Server is Not HTML SpecificTypically Identifies a Filetype by Extension

Or Directory Where File Exists

Page 4: Internet / Intranet Fall 2000

Brandeis University Internet/Intranet Spring 2000 4

Additional Common Web Server Features

Additional Security Beyond That Provided by O/SScripting

Ability to Dynamically Create a Web PageRun a Program Instead of Returning a File (CGI)

Return the Program Output as the Requested File

AdministrationLog FilesPerformance Monitoring

Page 5: Internet / Intranet Fall 2000

Brandeis University Internet/Intranet Spring 2000 5

Advanced Web Server FeaturesVirtual Hosting

Allow Multiple URL’s to Map to Same ComputerPerformance Optimization

CachingReliabilityScalability

Proxy Servers (For Security and Performance)Fetch Documents That are on Other Computers

Cache Them LocallyAllows for Easy Scalability

Multiple Proxy Servers Can Cache Documents From One Source Computer

Embedded ScriptingServer Side IncludesCustom Scripting Languages

Server API

Page 6: Internet / Intranet Fall 2000

Brandeis University Internet/Intranet Spring 2000 6

Web Servers – Added FunctionalityDatabase Connectivity

SQL, MySQLDirectory Listings

Icons, etc.Built-In Search EnginesBuilt-In ImageMap HandlingMultimedia SupportSession Emulation

Streaming MultimediaAdvanced Security

Encrypted HTTPS-HTTP (Secure HTTP) – CommerceNetSSL (Secure Sockets Layer) - Netscape

Web Server “Add-Ons”CGI Substitutes / CGI Optimizations

Cold Fusion

Page 7: Internet / Intranet Fall 2000

Brandeis University Internet/Intranet Spring 2000 7

Web Server History

All Web Servers Have a Common Roothttpd (NCSA)

UNIX OrientationMany Features are Essentially UNIX Features

ApacheWebsite (O’Reilly)Netscape Enterprise ServerMicrosoft Internet Information ServerA Slew of Others

Page 8: Internet / Intranet Fall 2000

Brandeis University Internet/Intranet Spring 2000 8

Apache

UNIX Origins – Now Ported to NTEvolved From httpdFreewareTypical UNIX Application

Public Source CodeMany Defaults, Conventions

BUT: All is Configurable

No GUI InterfaceConfigured via Scripts, Shell Commands, Config Files

Various “Flavors”Many Optional Features

APIApacheSSL

Page 9: Internet / Intranet Fall 2000

Brandeis University Internet/Intranet Spring 2000 9

IIS / Netscape

Microsoft IISNot Strictly Derived From httpd/ApacheWindows NTHowever: Functionally Very Similar to Apache

Emulates Many UNIX ConventionsE.g. Forward Slashes

Configuration via GUIPersonal Web ServerPeer Web Server

NetscapeMulti-Platform

UNIX is Preferred PlatformLess “Open” Than ApacheMore Secure?

Page 10: Internet / Intranet Fall 2000

Brandeis University Internet/Intranet Spring 2000 10

UNIX File StructureForward Slashes (/) to Separate Filenames, DirectoriesCase Sensitive File Names

Windows is NotNo Limit on Filename Size / Extensions

Extensions are by ConventionRoot is “/”User Home Directory is: “~/”Symbolic Links / Aliases

Directories Can Be Spread Over Multiple DrivesCan Create Non-Hierarchical Structure

File PermissionsRead, Write, ExecuteSeparate Permissions for Owner, Group, All

Directories are Special Cases of FilesExecute Permissions = Able to Browse Directory

Page 11: Internet / Intranet Fall 2000

Brandeis University Internet/Intranet Spring 2000 11

Web Server ConfigurationDirectory Structure

Virtual Document TreeAccess to User Directories

UNIX: ~userSymbolic Links

Be Careful: May Link You Out of Directory StructureCase Sensitivity

Ownership AccessServer is a Process Started by a User.

Has the Permissions of the User Who Started It.

Default DocumentsAllow Directory Browsing

ScriptingWho is Allowed to Run Scripts?How are Scripts Identified?

Page 12: Internet / Intranet Fall 2000

Brandeis University Internet/Intranet Spring 2000 12

Web Server File Access Control / Security

DirectoryO/S Level SecurityIP, Domain Level Security

Spoofing

Directory Access.htaccessMicrosoft Front-Page Extensions

EncryptionS-HTTP

Web Protocols OnlySSL

TCP/IP LevelV1.0 – V2.X : Security Holes Found, FixedV3.0 Is CurrentUses Port 443

Microsoft PCTResponse to Holes in SSL 2.0Now Use SSL

Page 13: Internet / Intranet Fall 2000

Brandeis University Internet/Intranet Spring 2000 13

Server Administration

Need Sysadmin and O/S ExpertiseLots of “Holes” Gotchas Whenever Scripts are Allowed

FTPWho is Allowed to Change Documents?Who is Allowed to Change Server Configuration?How do They Get Access?

Direct AccessRemote Access (e.g. FTP)

Log FilesAccessibilityDirectory StructureManagement

Page 14: Internet / Intranet Fall 2000

Brandeis University Internet/Intranet Spring 2000 14

HTTPThe Protocol For Requesting and Delivering Web Pages

Not Restricted to Returning HTML FilesClient Server Model

Request / ReponseTCP/IP Protocol Using Port 80

Supports Other Ports, Can Be Run Over Other Protocols“Replaced” FTP as the Primary Method For Internet File TransferStatelessUses MIME Format to Encapsulate DataMessage Structure Similar to SMTP Mail Messages

Message Header (metadata) Message Body (data)

Separated From Header by a Blank Line

Browser Only Displays Body, Not HeaderNo Restrictions on Message Size / Format (as with SMTP)

Page 15: Internet / Intranet Fall 2000

Brandeis University Internet/Intranet Spring 2000 15

HTTP Versions

HTTP 1.0 - Commonly Used VersionHTTP 1.1

Formalizes Many Extensions to Version 1.0Supports Persistent ConnectionsSupports Compression/DecompressionSupports Virtual Hosting

Single Server With Multiple IP AddressesSupports Multiple LanguagesSupports Byte Range Transfers

Useful For Re-Sending Interrupted Data Transfers

Similar to Process Used By XMODEM, etc.

Page 16: Internet / Intranet Fall 2000

Brandeis University Internet/Intranet Spring 2000 16

HTTP OVERVIEW

Client(Browser)

WebServer

FileSystem

HTTP Request

HTTP Response

HTMLHTML

Server Application

HTML

CGI

Page 17: Internet / Intranet Fall 2000

Brandeis University Internet/Intranet Spring 2000 17

HTTP Commands

Simple StructureMain Methods

GET <URI> HTTP/1.0Request the File Specified By the URLURI is URL Without Protocol/Port

HEADRequest the HTTP Header Information Only

Don’t Return the File ItselfPOST

Sends Data to The ServerTypically Data From a Form

Defined, But Not Widely ImplementedPUTDELETELINKUNLINK

Page 18: Internet / Intranet Fall 2000

Brandeis University Internet/Intranet Spring 2000 18

Common HTTP Header FieldsAdditional “Parameters” to the HTTP CommandsUsed in HTTP Requests:

Accept Lists the MIME Types That Client Can Accept

E.g. Accept text/plain, text/html or Accept *Accept-Charset

Lists Accepted Character Sets That Client Can AcceptASCII, ISO-8859-1 Are Assumed

Accept-EncodingAccept-LanguageAuthorization

Basic – UserName:Password (Base64 Encoding)CookieFrom

E-mail Address of Requesting UserNot Typically Used For Privacy ReasonsPrimarily Used By Automated Clients (e.g. Bots)

Page 19: Internet / Intranet Fall 2000

Brandeis University Internet/Intranet Spring 2000 19

Common HTTP Header Fields (2)Host

Virtual Host – One Server Handles Multiple SitesIf-Modified-Since

Only Return Data if it Has Been Modified Since This DatePragma

General Purpose For “Additional” Headers Not in StandardReferrer

The URL That Referred One to This URLUser-Agent

Name/Version of the HTTP Client

Used in HTTP Responses:Allow

Lists the Available Commands Supported by ServerContent-Encoding

Allows for Passing Data in Compressed FormatsContent-Language

Describes the Natural Language of the Intended Audience

Page 20: Internet / Intranet Fall 2000

Brandeis University Internet/Intranet Spring 2000 20

Common HTTP Header Fields (3)Content-Length

Size of the Message BodyContent-Type

The MIME Type For the DataDate Expires

HTTP Clients Should Not Cache Data After This DateLast-ModifiedLocation

Used For RedirectionMIME-VersionPragma

E.g. no-cacheRetry-After

When Server is Unavailable. Info On When to Try BackServer

Name/Version of the HTTP Server

Page 21: Internet / Intranet Fall 2000

Brandeis University Internet/Intranet Spring 2000 21

Common HTTP Header Fields (4)

TitleDescriptive Title of the File

WWW-AuthenticateWhen Authorization Denied, Tells Client Which Methods of Authentication are Supported

HTTP Status CodesReturned By the Server In First Line of ResponseInformational (100-199)Successful (200-299)

Redirection (300-399)Location in HTTP Header Specifies Redirection

Client Error (400-499)Server Error (500-599)

Page 22: Internet / Intranet Fall 2000

Brandeis University Internet/Intranet Spring 2000 22

Common Status Values200 – OK201 – Created (Post Request Was Fulfilled)204 - No Content (OK. Nothing For Client to Display300 - Multiple Choices

Requested Resource Available From Multiple Locations.List of Locations Returned in the Response.

301 - Moved Permanently302 - Moved Temporarily304 - Not Modified

Document Hasn’t Been Modified Since If-Modified Since Date

400 - Bad Request401 – Unauthorized403 - Forbidden404 – Not Found500 – Internal Server Error501 – Not Implemented (Server Does Not Support ThisRequest)502 – Bad Gateway (Invalid Response From Server)503 – Service Unavailable

Page 23: Internet / Intranet Fall 2000

Brandeis University Internet/Intranet Spring 2000 23

Cookies

Cookies Are Name Value Pairs Stored by the ClientPassed in the HTTP HeaderCookies Have Associated Expiration

Session (Default)Date / Time

Associated With a URL Path, Not a Page!Allows Passing Parameters Between Web Pages

Thus Cookies are Used to Provide State Information to a Stateless Protocol

Page 24: Internet / Intranet Fall 2000

Brandeis University Internet/Intranet Spring 2000 24

Web Server HTTP Functionality

Content NegotiationChoose From Several Different Formats Based on Request

Language NegotiationChoose From Versions of Same Document Based on Request

Support for HTTP-Put, HTTP-DeleteKeep-AliveAs-Is

Server Doesn’t Add HTTP HeadersAllows You to Create Specific Behavior

Redirect to Another SiteNever Saved in Browser’s Cache

Page 25: Internet / Intranet Fall 2000

Brandeis University Internet/Intranet Spring 2000 25

Class Exercise: HTTP

http://www.mkat.com/brandeis/httplist.cfm

Viewhttp.exe

Page 26: Internet / Intranet Fall 2000

Brandeis University Internet/Intranet Spring 2000 26

Server Log Files

Records Server Activity

Page 27: Internet / Intranet Fall 2000

Brandeis University Internet/Intranet Spring 2000 27

Some DefinitionsHits

Each HTTP Request is a HitAccessing a Web Page May Result in Multiple Hits

E.g. Each Graphic is a Hit

Page ViewsAccessing a Single Web Page is a Page View

E.g. Typing in a URL or Clicking on a Link

VisitsA Single Client’s Visit to Your Entire Site (Session)

May Include Multiple Page ViewsWhat Constitutes a Second Visit From the Same Client?

Why is This Important?Terms are Sometimes Used Interchangeably and Improperly

Compare Apples to ApplesImportant for Commercial Web Sites

Advertising is Based on Site AccessTypically Sold on Page View Basis

Page 28: Internet / Intranet Fall 2000

Brandeis University Internet/Intranet Spring 2000 28

Server Log Files

Many Variations to Web Server Log File Formats

Four Log FilesAccess (Transfer) Log

Each Hit is RecordedUser, Date/Time, HTTP Request, etc.

Error LogDate/Time, Error

Referrer LogReferring Page, Destination Page

Agent (User) LogClient’s Browser

Clearly a Need for StandardizationLinking the Four Log Files Together

Page 29: Internet / Intranet Fall 2000

Brandeis University Internet/Intranet Spring 2000 29

Common Log FormatHost

IP Address (or Hostname) of ClientSome Servers Perform Lookup of IP Address

RFC931HTTP Request: From

Seldom Used.

AuthuserHTTP Request: Authorization

UserName if Username Authorization is Required

Time StampHTTP Response: Date

E.g. [ 10/Jun/1998:14:23:34 -0700]

RequestThe Actual HTTP RequestE.g. GET /index.htm HTTP/1.1

Page 30: Internet / Intranet Fall 2000

Brandeis University Internet/Intranet Spring 2000 30

Common Log Format (2)

StatusThe HTTP Response Status Code

Transfer VolumeHTTP Response: Content-Length

Page 31: Internet / Intranet Fall 2000

Brandeis University Internet/Intranet Spring 2000 31

Extended Log File Format

Seven Common Log Format Fields PlusReferrer

HTTP Request: Referrer

User AgentHTTP Request: User-Agent

Identifies Browser

Other Common FieldsCookies

Can Help Identify Users

Page 32: Internet / Intranet Fall 2000

Brandeis University Internet/Intranet Spring 2000 32

IssuesClient vs. User

Typically Don’t Have User Level InformationOnly Record IP Address of Computer Used For Access

If Fixed IP Address For a Single User’s MachineThis Can Identify the User

Dynamically Assigned IP AddressesIdentifies the Overall Domain (e.g. AOL.com)

Proxy ServersAll Client’s Have IP Address of Proxy Server

Multiple “Sessions” at Same Time

Impossible to Have Truly Accurate InformationLog File Analysis Software Has Algorithms to Identify Page Views, Visits

Client Level Caching Affects Logs“ISP” Level Caching Affects Logs

E.g. AOL Maintains a CacheNo Requirement for Clients, ISPs to Follow Expiration Info

Page 33: Internet / Intranet Fall 2000

Brandeis University Internet/Intranet Spring 2000 33

Log File Maintenance on Server

Log Files Grow RapidlyLog Files Compress Very NicelyServer Configurable

Generate Daily/Weekly/Monthly Logs

Maintenance Scripts to Cleanup Log FilesCompressArchiveCycle

E.g. Maintain Current Months Files

Page 34: Internet / Intranet Fall 2000

Brandeis University Internet/Intranet Spring 2000 34

Log File AnalysisBig Business

Bread and Butter of Sites Driven By Advertising RevenueEvaluation Factors

Log File Formats SupportedAbility to Link Multiple Logs

How Log Files are Accessed (e.g. via FTP)Display Methodology

E.g. Available Via Web Pages

Lookup CapabilitiesE.g. Map User-Agent to BrowserE.g. Resolve IP Addresses to Domains, Regions

Level of AnalysisE.g. Calculating Visits, Return VisitorsConfigurability

Drill-Down CapabilitiesEnterprise Capabilities

Ability to Manage Multiple Sites

Page 35: Internet / Intranet Fall 2000

Brandeis University Internet/Intranet Spring 2000 35

Log File Analysis Options

Important to Understand the Core Log FilesLog File Analysis Programs Make Some Assumptions

FreewareCommercialService Bureaus

Page 36: Internet / Intranet Fall 2000

Brandeis University Internet/Intranet Spring 2000 36

In Class Exercise / Mini Homework

Download http://www.mkat.com/brandeis/sample.log

View in Text EditorLoad Into Excel

Delimited / Spaces

Review the Log File in Detail (Do Not Use Analysis Tools)

Describe What You Can Learn From the Log File

Add it To Your Homepage along With In Class ExercisesDue Next Week

Page 37: Internet / Intranet Fall 2000

Brandeis University Internet/Intranet Spring 2000 37

Resources

HTTPStein pp. 47-57

Server Comparison http://webcompare.internet.com/chart.htm

Apache Serverwww.apache.org

Website Serverhttp://website.ora.com

Microsoft IIS http://www.microsoft.com/NTWorkstation/downloads/Recommended/ServicePacks/NT4OptPk/Default.asp