40
Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 1 FALL 2011 LECTURES CMPSCI 120 #1: Wednesday, September 7 – First lecture introduction to the class. Hand out syllabus sheet. Cover timeline of technology from 1900 to 2011. Emphasize difference between Internet and Web; contributions of single individuals to major advances in tech (Tim Berners-Lee, Douglas Englebart, etc.). Place student birthdates into timeline appropriately to illustrate how much change has happened within their lifetimes. Contrast with that of my Grandmother, born three weeks before Wright Brothers’ first flight, died as metal being cut for International Space Station. Exhort students to be kind to elders, born before the exponential change in Net tech within their lifetimes. #2: Friday, September 9 – Exercise in image analysis, as preparation for scavenger hunt assignment, using historical picture of 1865 execution of conspirators in Lincoln assassination, handed out to students. Questions include when photography started, when glass-plate photography popular (evidenced by crack in image), when electricity became prevalent (gas lamp on wall), why a woman is being executed (Mary Surratt), what style of uniforms are being worn (Union soldiers), what the blurring of people walking illustrates (shutter speed), why people are carrying umbrellas when it isn’t raining (sun protection), etc. Handed out sheet of timeline from first lecture. #3: Monday, September 12 – Discussion of bias and psychological issues in performing searches. First exercise was to play the Say, Yeah! mashup video of Teen Titans characters, paying particular attention to the audio. Students did not hear anything out of the ordinary, except most of song is in Korean(?) with a few English words, until it was pointed out that several times the audio sounded like “your momma told me I’m a dead guy”. After that, everyone heard it. This is an audio example of pareidolia. Other example given was in early hominid evolution: seeing danger in the grass and reacting to it, regardless of whether danger actually exists, is an evolutionary advantage to the species, but means that individuals within species over time acquire bias for seeing patterns where they do not necessarily exist. (Modern examples: seeing faces in tacos or shower curtains.) Next, drew a line on the blackboard and asked students to place sites or groups according to bias (NTSB at low bias, news and religious organizations at high bias, Wikipedia somewhere in the middle, etc.). Gave example of how to use this in performing search: actively seek unbiased sites (difficult), or actively seek biased sites and subtract out the known bias (easy), figuring they’ve done most of the legwork against a particular topic. Finally, listed several “rules” of bias and self-deception people are likely to encounter on the Internet, including Poe’s Law, Hanlon’s Razor, No True Scotsman, etc., and gave examples of correlation vs. causality. This document is available on class site. Also warned students about “Rule 34”.

FALL 2011 LECTURES CMPSCI 120 - UMass Amherstverts/cmpsci120/... · pareidolia. Other example given was in early hominid evolution: seeing danger in the grass and reacting to it,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 1

FALL 2011 LECTURES CMPSCI 120

#1: Wednesday, September 7 – First lecture introduction to the class. Hand out syllabus sheet. Cover timeline of technology from 1900 to 2011. Emphasize difference between Internet and Web; contributions of single individuals to major advances in tech (Tim Berners-Lee, Douglas Englebart, etc.). Place student birthdates into timeline appropriately to illustrate how much change has happened within their lifetimes. Contrast with that of my Grandmother, born three weeks before Wright Brothers’ first flight, died as metal being cut for International Space Station. Exhort students to be kind to elders, born before the exponential change in Net tech within their lifetimes.

#2: Friday, September 9 – Exercise in image analysis, as preparation for scavenger hunt assignment, using historical picture of 1865 execution of conspirators in Lincoln assassination, handed out to students. Questions include when photography started, when glass-plate photography popular (evidenced by crack in image), when electricity became prevalent (gas lamp on wall), why a woman is being executed (Mary Surratt), what style of uniforms are being worn (Union soldiers), what the blurring of people walking illustrates (shutter speed), why people are carrying umbrellas when it isn’t raining (sun protection), etc. Handed out sheet of timeline from first lecture.

#3: Monday, September 12 – Discussion of bias and psychological issues in performing searches. First exercise was to play the Say, Yeah! mashup video of Teen Titans characters, paying particular attention to the audio. Students did not hear anything out of the ordinary, except most of song is in Korean(?) with a few English words, until it was pointed out that several times the audio sounded like “your momma told me I’m a dead guy”. After that, everyone heard it. This is an audio example of pareidolia. Other example given was in early hominid evolution: seeing danger in the grass and reacting to it, regardless of whether danger actually exists, is an evolutionary advantage to the species, but means that individuals within species over time acquire bias for seeing patterns where they do not necessarily exist. (Modern examples: seeing faces in tacos or shower curtains.) Next, drew a line on the blackboard and asked students to place sites or groups according to bias (NTSB at low bias, news and religious organizations at high bias, Wikipedia somewhere in the middle, etc.). Gave example of how to use this in performing search: actively seek unbiased sites (difficult), or actively seek biased sites and subtract out the known bias (easy), figuring they’ve done most of the legwork against a particular topic. Finally, listed several “rules” of bias and self-deception people are likely to encounter on the Internet, including Poe’s Law, Hanlon’s Razor, No True Scotsman, etc., and gave examples of correlation vs. causality. This document is available on class site. Also warned students about “Rule 34”.

Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 2

#4: Wednesday, September 14 – Email. Notion that all email is text, including non-text attachments. Notion that email is like a postcard – readable by everyone at every node between sender and receiver, so sending sensitive info such as credit card or Social Security numbers in plain text email is a Very Bad Idea in general, unless encryption is used. Notion that emails are easy to spoof (appear to come from someone other than actual sender). Notion that some email systems store messages on server, others always download to local client (what are advantages of each approach?). SPAM – Hormel canned meat product co-opted by old Monty Python skit – makes up most emails today. Examples of “phishing” attacks, trying to get users to do something they shouldn’t:

(1) “Your [bank, credit card, computer] account is compromised, to fix click HERE”. No legitimate company will do this, instead they will say “To fix go to our Web site”. By looking at underlying code, users can often tell where the click-here address is directed, always to a site not associated with corporation being spoofed.

(2) “Dear friend, my name is Mrs. ______ and my [husband, friend, colleague] has $___ million to get out of the country before the corrupt [government, bank, financial institution] steals it. Please help by providing bank info, and you’ll receive [5%, 10%, 15%] as a reward.” Also known as the Nigerian scam letter (although can come from anywhere). Once done on actual paper, today costs very little to send a billion emails, only need one pigeon to fall for scam.

Encryption is good, both for email and secure on-line ordering (how it works will be subject of future lecture), but was resisted by FBI, NSA, CIA for use by private citizens. (Conflict: need encryption for thriving e-commerce, but how to eavesdrop on drug dealers, child pornographers, and terrorists?) Questions: Is spamming illegal? How to prosecute when sender is in another country? How can VPNs (virtual private networks) protect communications for companies using insecure Internet? Is it OK to send credit card numbers, etc., in text messages?

#5: Friday, September 16 – Email finish up: play Monty Python SPAM skit, show example of Nigerian scam letter received days before. Play “State of the Internet 2009” video. Network topologies: point-to-point (one wire between every pair of machines, fast, but does not scale well: for N machines there are N(N-1)⁄2 connections, or O(N2)) , star (central fast machine, dumber but cheap satellite terminals or computers, central machine must be exceptionally fast and therefore expensive, vulnerable to single-point-failure of central machine, “Big Brother” issues, common trope of old science fiction view of computers), token-ring (each machine connects to only two neighbors, passes “token” or “magic cookie” around ring so every machine gets a chance to send messages, scales reasonably well, all machines identical and inexpensive, but break in ring causes overall failure), Ethernet (machines “talk” to common wire, everyone “listens” for messages addressed to them, machine needing to transmit waits until wire quiet then talks, two machines talking at same time causes message collision, but because machines listen to their own transmission they can detect when collisions occur [1s get changed to 0s, 0s get changed to 1s], then back off and try again at random time later, scales well but increased net traffic results in congestion, need routers).

Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 3

#6: Monday, September 19 – Introduction to Scavenger Hunt project, due Wednesday, September 28. Review of Ethernet from Friday. Need for routers/switches to separate legs of network. Outline of typical home network configuration: TV cable splits between TV and cable modem, cable modem can go directly to a computer (presenting 1 IP address to Internet) OR to a hub with multiple computers (each presenting its own unique IP address to Internet) OR to a router with multiple computers (each presenting its own unique local IP address to router, but router presents only one IP address to Internet). Local side of router can have hubs in different rooms, but no two computers should be more than three/four hubs from one another: router becomes central focal point in star network (if it fails, local network crashes). Wireless router can talk to wireless laptops just as if wire directly links between them. Printers may be connected to a local machine (which must be always left running), OR to wireless print server (in isolated room) which talks to the router, OR IP-enabled printers may be connected directly into the router.

Wireless: typical range 100 meters, but can be attenuated by objects (walls, chimneys, etc.) Current standard is IEEE 802.11 (2 megabits/second), with revisions 802.11b (11 mb/s), 802.11g (54 mb/s compatible with 802.11b), 802.11a (54 mb/s but not compatible with 802.11b or 802.11g), and 802.11n (faster using multiple simultaneous radio channels).

Student asked after class whether we would discuss “ad-hoc” networks (all described so far are “infrastructure” networks).

Details of IPv4 addressing coming on Wednesday.

#7: Wednesday, September 21 – Introduction to bits (binary digits, values 0 or 1) and bytes (aggregates of 8 bits, between 00000000 and 11111111, with 28 = 256 unique patterns). IPv4 addresses are four bytes, most common notation consists of decimal values 0…255 separated by periods (examples: 0.0.0.0 through 255.255.255.255, UMass addresses are all 128.119.xxx.xxx). Old style addresses in use from 1981 to 1993 are “classful” addresses, from class A through class E, depending on number of networks and number of machines within networks. Examples below show class patterns in binary, where x represent part of network address and y represent machine address within the network:

class A addresses: 0xxxxxx.yyyyyyyy.yyyyyyyy.yyyyyyyy (128 nets, 16,777,216 machines each) class B addresses: 10xxxxxx.xxxxxxxx.yyyyyyyy.yyyyyyyy (16384 nets, 65536 machines each) class C addresses: 110xxxxx.xxxxxxxx.xxxxxxxx.yyyyyyyy (2 million nets, 256 machines each) class D addresses: 1110xxxx.xxxxxxxx.xxxxxxxx.xxxxxxxx (multicast addresses) class E addresses: 1111xxxx.xxxxxxxx.xxxxxxxx.xxxxxxxx (future expansion, never used) Problem with classful addressing, is that it does not match reality. A site with 257 machines is

too large for a class C, but is wasteful of address space if moving to a class B – over 65,000 addresses are wasted. Worse for site with 70,000 machines – too big for a class B, but more than 16 million potential addresses are wasted.

Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 4

Solution is to use CIDR (classless inter-domain routing) which specifies how many bits are used in the network name and how many for the machine name – specification is with a slash and a number of bits. For example, an address such as xxx.xxx.xxx.xxx/14 uses 14 bits for the network identifier (214=16384) with the remaining 18 bits for the machine name (218=262144), smaller than a class A but bigger than a class B. Class A addresses roughly correspond to xxx.xxx.xxx.xxx/8, and last of the /8 groups were allocated in early 2011. (See http://xkcd.com/195/ for map of /8 groups as of 2006.)

All these techniques – classful addressing, CIDR, hiding networks behind routers (which present only one IP address to the outside world, and route packets coming in to the correct machine), served to delay the exhaustion of IPv4 addresses, but the hard limit of 4,294,967,296 unique IPv4 addresses never went away. Solution: IPv6.

#8: Friday, September 23 – Instead of four bytes, IPv6 addresses are eight words (where a word is two bytes, ranging between 0 and 65,535). This gives a total of 128 bits for an address instead of just 32. With 128 bits there are 2128=3.4×1038 unique addresses (232=4.3×109, so IPv6 is a lot larger than IPv4). Not likely to run out very soon (see also http://xkcd.com/865/). First proposed in 1998, so problem has been recognized for a long time. Reasons for not adopting by now include: cost of overhaul of hardware and software, complexity of getting newer standard to work with old, laziness, etc. Test day in June 2011 to see what bugs may surface.

DNS (Domain Name Service or Domain Name System) maps Web addresses (URLs) onto IP addresses. IP addresses are really what are needed. For example, http://www.cs.umass.edu/ can be replaced with http://128.119.240.19/ (although browsers are twitchy about security when you do this). DNS is designed so one of 13 or so top-level root servers world-wide (and their proxies) examine top level domains (.EDU, .COM, etc.) and hand off requests to proper server. Server handling .EDU domains hands off request to server handling UMass, then server handling UMass hands off request to server handling CS, etc. Eventually, some server knows the IP address of www.cs.umass.edu and returns it, or if address is invalid returns error message. In practice, this would severely overload root servers, serving trillions of URL requests per day. Instead, DNS servers cache recently requested IPs, so requests go up the chain instead of down, until some server can either handle the request or knows who can. Root servers get very little traffic as a result. Possible for cache to be “poisoned” to return wrong IP address. Cache entries have a TTL (time to live) so that stale IP addresses don’t stay cached forever and can be replaced eventually. TTL can be between seconds and weeks.

#9: Monday, September 26, 2011 – Interpreting a URL (Uniform Resource Locator). Addresses such as

http://www.cs.umass.edu/~verts/cmpsci120/cmpsci120.html (the class site) can be broken into sections:

http:// The protocol. Promise that the resource observes the conventions of the hypertext transport protocol. Other protocols include ftp://, telnet://, etc.

Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 5

www.cs.umass.edu The host address. Read from right-to-left. Traditional top-level

domains (TLDs) include .edu (education), .com (commercial), .net (network), .org (organizations), .gov (U.S. government), and .mil

(U.S. military), but new ones have been added, such as .aero (aeronautics), .biz (business), .xxx (porn), etc. TLDs may be suffixed

by country codes (.us, .uk, .jp, .ru, etc.), which have all been using Roman alphabet until recently, when United Arab Emirates, Egypt, and Saudi Arabia started using Arabic characters (from right to left), and the

Russian Federation starting using .РФ (Cyrillic for Russian Federation) in

addition to .ru (Russia).

~verts The username. When present, usernames are prefixed with the tilde (~) character to say “look in the account on the specified machine”.

cmpsci120/ The folder path. The name (or names) of a path of folders to the file being fetched.

cmpsci120.html The actual resource (file) being fetched. This can be a text file (.txt), an image file (.gif, .jpg, or .png), an Acrobat file (.pdf) or a Web

file (.htm or .html). When not specified, index.html or index.htm are assumed (the .htm extension dates from MS-DOS up through Windows 3.1, which only supported three character file extensions; Windows 95 was the first version to support longer filenames and extensions).

Legal subsets of this URL include:

http://www.cs.umass.edu/ File name not specified, by default fetch index.html from Computer Science server.

http://www.cs.umass.edu/~verts/ File name not specified, by default fetch index.html from verts’ account on Computer Science server.

http://www.cs.umass.edu/~verts/cmpsci120/ File name not specified, by default fetch index.html from cmpsci120 folder in verts’ account on Computer Science server.

http://www.cs.umass.edu/~verts/cmpsci120/cmpsci120.html Filename is specified, fetch cmpsci120.html from cmpsci120 folder in verts’ account on Computer Science server (overriding default value of index.html).

Sidebar discussion of .pdf (Portable Document Format) files. Used for distributing printer-

ready documents when tool used to create document may be something not everyone has. Acrobat Reader is free (Writer is not, although PDF specification is now open, so programmers can create their own PDF files). Files can be secured to protect intellectual property – password protected to restrict opening and/or printing and/or copying of document. Normally, PDF files

Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 6

can be annotated, or include animations, sounds, etc., but PDF/A (archival) documents contain only basic text guaranteed readable.

#10: Wednesday, September 28, 2011 – Beginning HTML. Tags and tag pairs, basic HTML Web page layout: <HTML> <HEAD> <TITLE>My Spiffy Web Page</TITLE> </HEAD> <BODY> Content </BODY> </HTML>

Use of BGCOLOR to change background color in BODY, as in <BODY BGCOLOR="green"> (BGCOLOR is deprecated). Colors may be also in 6-digit hexadecimal (base 16), where the first two digits encode red, the middle two encode green, and the last two encode blue. Values for each of the three color components are bytes that range from 00 (zero) to FF (255), where each of the two digits range between [0…9, A…F] and the leftmost of the two is “dominant” (contributes most to the value). After 9, A=10, B=11, C=12, D=13, E=14, and F=15. Having three bytes (24 bits) for color gives 16,777,216 unique colors encoded in exactly six characters. To convert a byte value to hex, divide it by 16; the quotient is the leftmost digit and the remainder is the rightmost digit. For example, the number 236 when divided by 16 gives the quotient 14 and the remainder 12, which is EC in hexadecimal. Similarly, to convert EC back to decimal the equation is E×16 + C×1, or 14×16 + 12 = 224 + 12 = 236.

#11: Friday, September 30, 2011 – Making a basic Web page, utilizing basic markup tags:

Boldface: <B>…</B>, Italic: <I>…</I>, Superscript: <SUP>…</SUP>, Subscript: <SUB>…</SUB>, heading tags: <H1>…</H1> through <H6>…</H6>, Centering: <CENTER>…</CENTER> (deprecated), named colors: red, green, yellow, blue, etc. hex colors: #FF0000, #1EA7F9, etc. HTML entities: &copy; equivalent to &#169;, etc. fractions: &frac12; &frac14; &frac34; (respectively: ½, ¼, ¾, but no others)

<SUP>numerator</SUP>&frasl;<SUB>denominator</SUB>), links: <A HREF="url">link text</A>. font: <FONT FACE="typeface" COLOR="color">text</FONT> Both <CENTER>…</CENTER> and <FONT>…</FONT> tags are deprecated, as is the

BGCOLOR attribute of the BODY tag, meaning that the standards committees prefer that we not use them. These items may disappear from future versions of HTML.

Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 7

Why is the <FONT> tag deprecated? Example: <CENTER><FONT FACE="___" COLOR="___"><H2>Paragraph #1</H2></FONT></CENTER> text text text… <CENTER><FONT FACE="___" COLOR="___"><H2>Paragraph #2</H2></FONT></CENTER> text text text…

Embedding of style in with content leads to cluttered, hard-to-maintain code. This motivates

the use of CSS (Cascading Style Sheets) as a means to decouple style from content.

#12: Monday, October 3, 2011 – Introduction to CSS: <STYLE TYPE="text/css">…</STYLE> in

header, STYLE="___" attribute inside tags, <LINK REL="stylesheet" TYPE="text/css" HREF="filename.css"> to pull up external file. The “cascade” of

dependencies: the STYLE attribute overrides the <STYLE> section, which overrides any <LINK> to external style sheet files, which overrides the default settings for tags. Example (equivalent to the non-style approach in previous lecture):

<STYLE TYPE="text/css"> H2 {text-align:center ; font-family:____ ; color:____} P {text-align:justify ; text-indent:0.5in} </STYLE> … <H2>Paragraph #1</H2> <P>text text text…</P> <H2>Paragraph #2</H2> <P>text text text…</P> or MyStyles.css contains the text:

H2 {text-align:center ; font-family:____ ; color:____} P {text-align:justify ; text-indent:0.5in}

and HTML document contains:

<LINK REL="stylesheet" TYPE="text/css" HREF="MyStyles.css"> <STYLE TYPE="text/css"> </STYLE> … <H2>Paragraph #1</H2> <P>text text text…</P> <H2>Paragraph #2</H2> <P>text text text…</P>

Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 8

Pulling style information out of body of document “declutters” text, makes changes easier to implement and more consistent, and many pages using the same linked style sheets all see changes at once. Empty <STYLE>…</STYLE> sections can be omitted, but use of <STYLE>

section with <LINK> to external style sheet can be valuable, as <STYLE>…</STYLE> is then used to override settings in external style sheet for the current document only.

Colors in CSS can be names (red, green, blue, etc.) or traditional 6-digit hex numbers (e.g., #1EF208), but also may be 3-digit short hex (#1AE, which by “stuttering” is equivalent to the

6-digit color #11AAEE), or can use the rgb function with either byte values 0..255 or percents 0%…100% (e.g., rgb(255,128,196) or rgb(100%,50%,75%), as preferred).

Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 9

#13: Wednesday, October 5, 2011 – Types of graphics files.

.BMP (bitmap). Native to Windows. Uncompressed. Four styles: 24-bit (3 bytes per pixel, true RGB color, 16,777,216 colors possible) 256-color (1 byte/8 bits per “pixel”, up to 256 true RGB colors in palette) 16-color (½ byte/4 bits per “pixel”, up to 16 true RGB colors in palette) 2-color (⅛ byte/1 bit per “pixel”, up to 2 true RGB colors in palette, often B&W) A “palette” is a table stored in the file (takes only a small amount of storage) containing colors picked from the 16,777,216 possible true RGB colors – the “pixels” in paletted formats are actually indexes into this table. Due to their size, .BMP files are typically unsuitable for use on the Web; early browsers did not support the file type at all.

.JPG/.JPEG (Joint Photographic Experts Group). Supports 16,777,216 colors. Compressed, but

uses lossy compression technique. Converting a 24-bit .BMP into .JPG results in image that is visually identical to source, but pixel values have been changed subtly. Great for photographs, where lossy compression won’t be noticed. Poor for cartoons, line-art, text (images with high contrast edges) because lossy compression “fuzzes out” the edges. Compression “quality” setting between 1…100 is trade-off between file size and image quality – larger settings give better images but larger files; smaller settings give smaller files but compression artifacts become more noticeable. Widely used on the Web.

.GIF (Graphics Interchange Format, developed by CompuServe in 1987 with a revision in 1989).

Compressed, and uses lossless compression, but supports only up to 256 colors (using palette of true 24-bit RGB colors). Converting a 256-color, 16-color, or 2-color .BMP into .GIF results in image that is pixel-for-pixel identical to the source. (Converting a 24-bit .BMP requires loss of color-depth first.) Supports transparency in 1989 version only (one color not painted to let background show through, simulating non-rectangular images). Supports simple animations in 1989 version only. Patent entanglements in mid-1990s threatened its use, but all relevant patents have now expired. Great for cartoons, line-art, text, but mediocre for photographs. Badly supported by Windows Paint (messes up palettes), but well supported by other graphics packages. Widely used on the Web.

.PNG (Portable Network Graphics, proposed in 1996 as a response to deficiencies of both .JPG

and .GIF). Supports up to 48-bit color (16 bits, or 2 bytes, for each primary of R, G, and B). Supports both lossy compression for photographs and lossless compression for cartoons, line-art, and text. Supports transparency. No animation. Format is free for general use (no patent issues). Initially slow to be adopted, but presently all major browsers, and Microsoft Word, support the format. Good all-around choice for photographs or line-art and cartoons. Widely used (now) on the Web.

.TIF/.TIFF (Tagged Image File Format). Typically used on Macs, but now may be found

anywhere. Has many of the same features as earlier formats.

Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 10

Use of graphic images in HTML: <IMG SRC="filename">

or <IMG SRC="filename" WIDTH="__" HEIGHT="__" TITLE="text to appear on mouse-over" ALT="text to appear if image cannot be shown">

#14: Friday, October 7, 2011 – Images on the Web. Repeated background images on a Web page:

<BODY BACKGROUND="__"> (deprecated attribute), vs. the equivalent style approach

(<STYLE> BODY {background-image:url('__')} </STYLE>) along with

background-repeat and background-position attributes. Use of simple graphics

editors such as Windows Paint to create speckle patterns for backgrounds ( ) and 3D sculptured buttons:

Recommendation: Bookmark W3Schools for on-line HTML and CSS reference:

http://www.w3schools.com/

#15: Tuesday, October 11, 2011 (Monday schedule) – Discussion of legal issues involving images (in particular, copying an image from someone’s site, putting in on your own, then displaying it with

<IMG SRC="pic.jpg">, versus linking to original image on owner’s site and wasting their bandwidth <IMG SRC="http://someone_elses_site/pic.jpg">).

Client-side image maps (first example of linking tags together by name):

<IMG SRC="pic.jpg" USEMAP="#MyMap"> <MAP NAME="MyMap"> <AREA SHAPE="rect" COORDS="x1,y1,x2,y2" HREF="__"> <AREA SHAPE="circle" COORDS="x,y,r" HREF="__"> <AREA SHAPE="poly" COORDS="x1,y1,x2,y2,…,x1,y1" HREF="__"> <AREA SHAPE="default" HREF="__"> </MAP> Rectangles need two <x,y> coordinate pairs (points) to define opposing corners. Circles need

one point (the center) and a radius. Polygons always use an arbitrary list of points, but where the last point is the same as the first point to close the polygon. Shapes are allowed to overlap (for example a circle and a rectangle), but if the mouse is clicked in the overlapping region it appears that the first shape in the list takes priority. In most cases, however, the overlapping regions are likely to point at the same URL. The “default” shape represents any portion of the image not covered by any other shape (it is optional and may be omitted).

Demonstration of using Windows Paint to extract coordinates from image.

Demonstration of my MakeButtons program to create button panels and equivalent HTML.

Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 11

#16: Wednesday, October 12, 2011 – Demo of Mac running Parallels running Windows 7 running CamWatcher. Problem with all raster images (.GIF, .JPG, .PNG, etc.) is that all pixels are stored, and when image is zoomed up in scale what look like smooth lines become jagged – a mathematically pure line is painted with an alias.

Discussion of .SVG (scalable vector graphics) files – description of objects on screen with interior color (fill), border color (stroke), and border thickness (stroke-width), along with attributes for controlling the ends of lines or how one line connected to another. Browsers re-render the files at requested scales, so images look “perfect” at all sizes, with no jaggies or aliasing. Files are plain text (emailable, editable with Notepad on a PC or Text Editor on the Mac, etc.), but have “magic incantations” at the start which determine the character set (UTF-8), rules for

configuration (http://www.w3.org/2000/svg), etc. While not required, SVG files may

also contain a style section that occupies the same conceptual position as the <STYLE> block in an HTML document, with some minor differences.

Tags in SVG files are always in lower-case, and standalone tags contain a trailing slash (as in

<rect … />). The trailing slash is not required for standalone HTML tags, but is becoming

recommended practice (as in an image tag <IMG … />). Comments are the same as in HTML: <!-- comment text -->, and comments can span lines.

Here is an SVG file (created by my own Bézier Madness program) with a triangle, rectangle, and circle on a cyan background:

<?xml version="1.0" encoding="UTF-8" standalone="no"?> <!-- Generator Copyright (C)2010 Dr. William T. Verts (all rights reserved) --> <!-- File Created: Thursday, February 24, 2011 9:11:50 AM --> <!-- Generated from Dr. Bill's Bezier Madness --> <!-- C:\Users\Bill\Desktop\SVG Demo\MyDiagram.svg --> <svg xmlns:svg="http://www.w3.org/2000/svg" xmlns="http://www.w3.org/2000/svg" version="1.1" x="0px" y="0px" width="320px" height="240px" > <style type="text/css" > <![CDATA[ polygon { stroke-linecap:round; stroke-linejoin:miter; } polyline { stroke-linecap:round; stroke-linejoin:round; }

Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 12

line { stroke-linecap:round; stroke-linejoin:round; } path { stroke-linecap:round; stroke-linejoin:round; } ]]> </style> <!-- Background Color --> <rect x="0px" y="0px" width="320px" height="240px" style="fill:#00FFFF" /> <!-- ******************** Object #1 ******************** --> <!-- Triangle --> <polygon points=" 150, 20, 20, 60, 190, 100, 150, 20" style="fill:#00FF00;stroke:#000000;stroke-width:3;" /> <!-- ******************** Object #2 ******************** --> <!-- Rectangle --> <rect x="100" y="120" width="160" height="100" style="fill:#00FF00;stroke:#008080;stroke-width:3;" /> <!-- ******************** Object #3 ******************** --> <!-- Circle --> <circle cx="100" cy="100" r="40" style="fill:#FFFF00;stroke:#FF0000;stroke-width:6;" /> </svg>

Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 13

#17: Friday, October 14, 2011 – Three HTML topics: Horizontal Rules, Ordered and Unordered Lists, and Tables. This pretty well covers “standard” or “boring” HTML, not including frames, forms, JavaScript, icons, etc.

Horizontal Rules: <HR> tag. Draws horizontal line across browser window, default is to have a

small gap at each end. Can be modified using SIZE attribute, as in <HR SIZE="200">, but

SIZE attribute is deprecated in favor of style sheet approach: <HR STYLE="width:200px"> or <HR STYLE="width:3in"> (or if placed up in the style block in the header section:

<STYLE> HR {width:3in} </STYLE>). Length measurements in CSS can be pixels (px), inches (in), centimeters (cm), percents (%), etc.

Lists have enclosing tags <OL>…</OL> (ordered) or <UL>…</UL> (unordered), and each list

item starts with <LI>. Closing list item with </LI> is recommended, but not strictly required by most browsers.

Ordered Lists: Appears As: Unordered Lists: Appears As:

<OL> <UL> <LI>List item</LI> 1 List Item <LI>List item</LI> ● List Item <LI>List item</LI> 2 List Item <LI>List item</LI> ● List Item <LI>List item</LI> 3 List Item <LI>List item</LI> ● List Item </OL> </UL>

By default ordered lists use leading Arabic numbers (1,2,3,…) and unordered lists use round bullets (●,●,●, …) but these may be changed. Using the TYPE attribute (deprecated) or styles, lists can be formatted as follows:

Symbol Ordered Lists using TYPE Using Styles 1,2,3,4,… <OL TYPE="1"> <OL STYLE="list-style-type:decimal"> A,B,C,D,… <OL TYPE="A"> <OL STYLE="list-style-type:upper-alpha"> a,b,c,d,… <OL TYPE="a"> <OL STYLE="list-style-type:lower-alpha"> I,II,III,IV,… <OL TYPE="I"> <OL STYLE="list-style-type:upper-roman"> I,ii,iii,iv,… <OL TYPE="i"> <OL STYLE="list-style-type:lower-roman"> Symbol Unordered Lists using TYPE Using Styles ● <UL TYPE="disc"> <UL STYLE="list-style-type:disc"> ■ <UL TYPE="square"> <UL STYLE="list-style-type:square"> ○ <UL TYPE="circle"> <UL STYLE="list-style-type:circle">

Using styles, individual list items may override the symbol normally used by the <OL> or <UL> tag, as in <LI STYLE="list-style-type:disc">.

Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 14

Tables <TABLE>…</TABLE> are composed of rows <TR>…</TR>, which are themselves composed of data <TD>…</TD>. Table data (cells) contain the information for that cell, and can be text, graphics, links, etc. Use the <TH>…</TH> for table headers, which format their data to be boldfaced and centered. Here is a table with three rows and four columns:

<TABLE>

<TR> <TD>row 1 column 1</TD> <TD>row 1 column 2</TD> <TD>row 1 column 3</TD> <TD>row 1 column 4</TD> </TR> <TR> <TD>row 2 column 1</TD> <TD>row 2 column 2</TD> <TD>row 2 column 3</TD> <TD>row 2 column 4</TD> </TR> <TR> <TD>row 3 column 1</TD> <TD>row 3 column 2</TD> <TD>row 3 column 3</TD> <TD>row 3 column 4</TD> </TR> </TABLE>

Tables with the BORDER or BORDER="2" or BORDER="2px" attribute will be rendered on screen with a lined border around each cell. Tables without a border show just the data on screen. In a <TD> tag adjacent cells may be merged by using the attributes ROWSPAN or COLSPAN, but care must be taken to count up the number of cells properly in a row or column. For example <TD COLSPAN="3"> merges three cells in the current row. The tags <TABLE>, <TR>, <TH>, and <TD> may all contain a BGCOLOR="___" attribute to change the color of the table, row, header, or data, respectively, but BGCOLOR is deprecated (the table color is overridden by the row color, which in turn is overridden by the color of the header or data cell. The style approach is to use STYLE="background-color:____" in a particular tag (or in the style block, as appropriate). Tables have a large number attributes, both deprecated and not, and many style options. These include colors, border thickness and color, vertical and horizontal alignment of data within cells, alignment and width of the table, etc.

Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 15

#18: Monday, October 17, 2011 – Introduction to Frames. Initial setup page (Contents.html) is a page of simple links, using traditional HTML techniques:

<HTML> <HEAD>

<TITLE>Table of Contents</TITLE> </HEAD> <BODY BGCOLOR="Yellow"> <H1>Contents</H1> <A HREF="http://www.yahoo.com">Yahoo</A><BR> <A HREF="http://www.cnn.com/">CNN</A><BR> <A HREF="http://www.cs.umass.edu/~verts">Dr. Bill</A><BR> </BODY> </HTML>

This page uses nothing that hasn’t been seen before. Clicking on any link causes the browser to replace the currently displayed page with the linked page. The first change is to insert an attribute into each link to open each new page in its own page of the notebook:

<HTML> <HEAD> <TITLE>Table of Contents</TITLE> </HEAD> <BODY BGCOLOR="Yellow"> <H1>Contents</H1> <A HREF="http://www.yahoo.com" TARGET="_blank">Yahoo</A><BR> <A HREF="http://www.cnn.com/" TARGET="_blank">CNN</A><BR> <A HREF="http://www.cs.umass.edu/~verts" TARGET="_blank">Dr. Bill</A><BR> </BODY> </HTML>

Notice that there is an underscore before the word “blank” in TARGET="_blank", which is an indicator of a special target, namely a new (blank) window or page.

Now we build the driver for the frames, which can be the index.html default page:

<HTML> <HEAD> <TITLE>Test of Frames</TITLE> </HEAD> <FRAMESET COLS="200,*"> <FRAME SRC="Contents.html" NAME="TOC"> <FRAME SRC="MainBody.html" NAME="MAIN"> </FRAMESET> <NOFRAMES> Put code here for browsers which do not support frames </NOFRAMES> </HTML>

In this code, there is no <BODY> tag, but instead the <FRAMESET> tag defines how the screen is to be split, and how large each split (frame) is to be. In this case, the <FRAMESET COLS="200,*"> defines two frames, split vertically, one frame 200 pixels wide and the other the remainder of the width of the browser window.

Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 16

If we were to split the window into three columns of 300 pixels, 200 pixels, and variable pixels in width, respectively, the command would be <FRAMESET COLS="300,200,*"> instead. Similarly, if we wished to split the window into rows instead of into columns, we might use <FRAMESET ROWS="200,*"> instead. In both cases the * means to use for the last frame whatever portion of the window is leftover after defining the other frame(s). Each frame links to its own page, BUT those frames must also have names so that the Contents.html page can link to the correct one. In this case the name TOC will not be used, but each page in Contents.html will open its linked page in the frame named MAIN. Finally, the <NOFRAMES> section will contain code to show in the browser if frames are not supported. If frames are supported, everything works but the code between <NOFRAMES> and </NOFRAMES> will be ignored. If frames are not supported, the <FRAMESET>, <FRAME>, <NOFRAMES>, and </NOFRAMES> tags will all be ignored, and the only code rendered by the browser will be that which remains inside the <NOFRAMES>…</NOFRAMES> section. Here’s what the final version of Contents.html will contain: <HTML> <HEAD> <TITLE>Table of Contents</TITLE> </HEAD> <BODY BGCOLOR="yellow"> <H1>Contents</H1> <A HREF="http://www.yahoo.com" TARGET="MAIN">Yahoo</A><BR> <A HREF="http://www.cnn.com/" TARGET="MAIN">CNN</A><BR> <A HREF="http://www.cs.umass.edu/~verts" TARGET="MAIN">Dr. Bill</A><BR> </BODY> </HTML>

By defining MAIN in the <FRAME> tag of one page and using TARGET="MAIN" (instead of TARGET="_blank") in the target of a link in another page, we have a mechanism for linking resources in different pages. Note that MAIN is neither a special nor a reserved word: we could have said <FRAME SRC="…" NAME="Fred"> so long as the link to that frame in Contents.html was <A HREF="…" TARGET="Fred">. Finally, we need a default page, linked as MainBody.html in the second <FRAMESET> tag. This can be anything meaningful, such as: <HTML> <HEAD> <TITLE>Main Body</TITLE> </HEAD> <BODY BGCOLOR="cyan"> <H1>Welcome to my Web page!</H1> This is the default page that will load automatically when the main frame page is loaded. </BODY> </HTML>

Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 17

Upon initial loading of the frame driver code, the browser window will look like the following image. Clicking any link in the yellow contents page in the left frame will replace the right frame with the linked page (the left frame remains unchanged).

Finally, we can modify the Contents.html page by using buttons instead of text for the links,

AND through JavaScript have each button change from a plain to a highlighted design when the mouse rolls over the button. The first image below shows no button selected, and the second shows where the mouse has rolled over the CNN button:

Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 18

Here is the new HTML code for Contents.html:

<HTML> <HEAD> <TITLE>JavaScript RollOver Test</TITLE> <SCRIPT LANGUAGE="JavaScript"> <!-- function ShowImage (item_name,image_name) { eval ('document.' + item_name + '.src = "' + image_name + '"'); return true ; } //--> </SCRIPT> </HEAD> <BODY BGCOLOR="#C0C0C0"> <H1>Contents</H1> <A HREF="http://www.cnn.com" onMouseOver="ShowImage('Button1', 'Button1_Selected.gif') ;" onMouseOut ="ShowImage('Button1', 'Button1_Normal.gif') ;" TARGET="MAIN"> <IMG SRC="Button1_Normal.gif" BORDER=0 NAME="Button1"> </A> <BR> <A HREF="http://www.yahoo.com" onMouseOver="ShowImage('Button2', 'Button2_Selected.gif') ;" onMouseOut ="ShowImage('Button2', 'Button2_Normal.gif') ;" TARGET="MAIN"> <IMG SRC="Button2_Normal.gif" BORDER=0 NAME="Button2"> </A> <BR> </BODY> </HTML>

The new section is the <SCRIPT> section in the heading, where JavaScript code (typically user-defined functions such as ShowImage) can reside. That function is called by events, in this case the onMouseOver event (when the user brings the mouse over the link) and onMouseOut (when the user moves the mouse away from the link). The purpose of the code as written is to define a name for each <IMG> tag, then use that name to assign a default button to that tag both when the page is initially loaded and when the mouse moves away from the link, and another “selected” button when the mouse moves onto the link. The “heavy lifting” is performed by the ShowImage function, which builds in real-time from its arguments (the image tag name and the image file name) a string that looks like the following:

document._____.src = "_____"

with the defined name of the <IMG> tag in the first slot and the name of the image file (a .GIF) in the second slot, and then executes (or evaluates) that string as a legal JavaScript command. We’ll look at JavaScript in some detail in the next lecture.

Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 19

#19: Wednesday, October 19, 2011 – Introduction to JavaScript. As we have seen from the previous lecture, JavaScript code can be inserted into the heading of an HTML Web page. There are two ways to do this. The version on the left is an early approach, which is now deprecated in favor of the version on the right (similar to how we link CSS in a STYLE block):

<SCRIPT LANGUAGE="JavaScript"> <SCRIPT TYPE="text/javascript"> <!-- <!-- // JavaScript code goes here // JavaScript code goes here //--> //--> </SCRIPT> </SCRIPT>

In both cases the HTML comment framework <!-- and --> is so that browsers that do not support JavaScript fail gracefully. If JavaScript is not supported the <SCRIPT> and </SCRIPT> tags are ignored, and the HTML comments hide all internal JavaScript code from the browser – it will not appear in the rendered body of the Web page (the page might not work correctly, but it won’t have a large pile of code cluttering up the page). If JavaScript is supported, JavaScript knows to ignore the opening HTML comment, and the // JavaScript comment hides the closing HTML comment. This framework can appear in both the <HEAD> section AND in the body wherever code needs to appear. For the examples here, the code will be strictly in the body of the page, as follows:

<HTML> <HEAD> <TITLE>Testing some JavaScript code</TITLE> </HEAD> <BODY> <H1>Testing some JavaScript code</H1>

<SCRIPT TYPE="text/javascript"> <!-- * //--> </SCRIPT> </BODY> </HTML>

* From now on, rather than show the entire page for every example only the JavaScript code that goes in the framework marked by the asterisk will be shown. Here’s the first example:

document.writeln ("Hello, World") ;

This writes into the current document the string Hello, World which will appear just as if we had written it directly into the HTML code. Similarly:

document.writeln ("Hello, World") ; document.writeln ("Hello, World") ;

writes the string in twice, but because whitespace is ignored by browsers, the two strings will be run together on the browser window.

Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 20

To put them on different lines, we have to write in the break tag manually.

document.writeln ("Hello, World") ; document.writeln ("<BR>") ; document.writeln ("Hello, World") ;

While these examples are pretty inefficient, the critical idea is that JavaScript can create HTML content in real-time as the page is being rendered by the browser. In the next examples we use variables, which are values in memory that we choose to give names, and expressions that use those variables:

N = 56 ; M = 47 ; document.writeln (N+M) ;

The result of this is to write the number 103 (the sum of 56 and 47) into the Web page as it is being rendered. Variables may be either strings (surrounded by quotes) or double-precision floating-point numbers (numbers with fractions), and JavaScript will determine the data type of a variable as it receives its new value. A variable may be a string at one moment and a number a bit later. This is called dynamic typing (in contrast to other languages that use static typing, where a variable always has the same data type throughout the execution of the program). Note that the code above could have been written without semicolons as:

N = 56 M = 47 document.writeln (N+M)

This is not recommended, as statements can extend over several lines, and several statements can also appear on the same line, as in:

N = 56 ; M = 47 ; document.writeln (N+M) ;

Always use semicolons to terminate statements (although there are a couple of exceptions). The code can ask questions of its variables with an if statement, as in the following code:

N = 56 M = 47 if (N > M) { document.writeln ("N is bigger") ; } else { document.writeln ("M is bigger or equal") ; }

What is written into the HTML code is either the string N is bigger or the string M is bigger or equal, but which one is written depends on the values of the variables N and M. (Which one is it here?).

Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 21

Finally, the power of JavaScript (or any language) comes from the action of a loop, which is code that executes over and over many times. To do this, we need a variable to count the number of times we’ve gone through the loop. Here’s the basic framework: N = 1 ; while (N <= 10) { // do something interesting here N = N + 1 ; }

This framework executes the code in between the {…} until variable N becomes greater than 10, and because of the N = N + 1 statement (replace the value of N with the old value of N plus 1) this happens after the 10TH pass through the loop. To execute the loop 1000 times we need only replace the 10 with 1000. Here is code that puts the numbers 1 through 1000 into our Web page: N = 1 ; while (N <= 1000) { document.writeln (N) ; N = N + 1 ; }

To do something interesting, we would like to build a Web page that contains all integers between 1 and 1000 and their square-roots. We can’t do this by hand very easily, but we can do this with JavaScript, assuming there is a square-root function available. Not knowing this, we look up “JavaScript square root” on the Web (Google or some other search engine), which again takes us to http://www.w3schools.com at the page talking about that particular function. We learn that the square-root function is called sqrt and is part of the Math object, so to use it we write Math.sqrt(___) and put the item for which we need a square root inside the parentheses. To make all of our answers appear on the Web page in an unordered list, we have to write the JavaScript code to create the correct tags in real-time. Here is the final code: document.writeln ("<UL>") ; N = 1 ; while (N <= 1000) { document.writeln ("<LI>Square Root (", N, ") = ", Math.sqrt(N), "</LI>") ; N = N + 1 ; } document.writeln ("</UL>") ;

So, when N = 1 and the square root returns 1 as well, the string written out by the loop is:

<LI>Square Root (1) = 1</LI>

and when N = 2 and the square root returns 1.4142135623730951 the code written out is:

<LI>Square Root (2) = 1.4142135623730951</LI>

Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 22

Here is the complete HTML page, including the JavaScript code to generate the list of square roots: <HTML> <HEAD> <TITLE>Testing some JavaScript code</TITLE> </HEAD> <BODY> <H1>Testing some JavaScript code</H1> <SCRIPT TYPE="text/javascript"> <!-- document.writeln ("<UL>") ; N = 1 ; while (N <= 1000) { document.writeln ("<LI>Square Root (", N, ") = ",

Math.sqrt(N), "</LI>") ; N = N + 1 ; } document.writeln ("</UL>") ; //--> </SCRIPT> </BODY> </HTML>

This is a lot easier than writing out 1000 lines of square roots by hand. #20: Friday, October 21, 2011 – Review for midterm #1. #21: Monday, October 24, 2011 – Midterm #1, in-class. #22: Wednesday, October 26, 2011 – Forms and Buttons, sending form data to server. Forms have

names and actions, and contain buttons, checkboxes, radio buttons, text areas, etc. Actions determine what to do with the form data when a special button (TYPE="submit") is clicked. Most common action is to post the data to the Web (METHOD="POST"), but this requires a server script to receive and process the data (ACTION="http://www. … /cgi-bin/echohtml.cgi">). These scripts are called CGI (Common Gateway Interface) scripts, and are programs written in languages such as Perl, Python, PHP, etc. Here is a Web page containing a form showing a lot of the common input methods, and where the form data are posted to a CGI script called echohtml.cgi on my site (commonly, CGI scripts are in a folder called cgi-bin):

<HTML> <HEAD> <TITLE>Test of HTML Forms</TITLE> </HEAD> <BODY BGCOLOR="#00FFFF"> <FORM NAME="MyForm1" METHOD="POST" ACTION="http://www.cs.umass.edu/~verts/cgi-bin/echohtml.cgi"> <H3>Buttons</H3> <H4>(User-defined buttons require JavaScript onClick events)</H4> <INPUT TYPE="button" NAME="Button1" VALUE="Click Me #1"><BR> <INPUT TYPE="button" NAME="Button2" VALUE="Click Me #2"><BR> <INPUT TYPE="button" NAME="Button3" VALUE="Click Me #3"><BR> <INPUT TYPE="button" NAME="Button4" VALUE="Click Me #4"><BR> <INPUT TYPE="button" NAME="Button5" VALUE="Click Me #5"><BR>

Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 23

<INPUT TYPE="reset" VALUE="Reset All"> Resets items to default values<BR> <INPUT TYPE="submit" VALUE="Submit this Form"> Sends data to server<BR> <H3>Checkboxes</H3> <INPUT TYPE="checkbox" NAME="Check1">My Checkbox 1<BR> <INPUT TYPE="checkbox" NAME="Check2" CHECKED="checked">My Checkbox 2<BR> <INPUT TYPE="checkbox" NAME="Check3">My Checkbox 3<BR> <INPUT TYPE="checkbox" NAME="Check4">My Checkbox 4<BR> <INPUT TYPE="checkbox" NAME="Check5">My Checkbox 5<BR> <H3>Radio Buttons</H3> <INPUT TYPE="radio" NAME="Radio1" VALUE="Radio1Option1" CHECKED="checked">Radio #1 #1<BR> <INPUT TYPE="radio" NAME="Radio1" VALUE="Radio1Option2">Radio #1 #2<BR> <INPUT TYPE="radio" NAME="Radio1" VALUE="Radio1Option3">Radio #1 #3<BR> <INPUT TYPE="radio" NAME="Radio1" VALUE="Radio1Option4">Radio #1 #4<BR> <BR> <INPUT TYPE="radio" NAME="Radio2" VALUE="Radio2Option1" CHECKED="checked">Radio #2 #1<BR> <INPUT TYPE="radio" NAME="Radio2" VALUE="Radio2Option2">Radio #2 #2<BR> <INPUT TYPE="radio" NAME="Radio2" VALUE="Radio2Option3">Radio #2 #3<BR> <INPUT TYPE="radio" NAME="Radio2" VALUE="Radio2Option4">Radio #2 #4<BR> <H3>Input Text Boxes</H3> <INPUT TYPE="text" NAME="Input1" VALUE="Default Text">default text<BR> <INPUT TYPE="text" NAME="Input2"><BR> <INPUT TYPE="text" NAME="Input3"><BR> <INPUT TYPE="password" NAME="Password1"> This is a password box<BR> <H3>Drop-Down List</H3> <SELECT NAME="List1"> <OPTION>(A) I don't know <OPTION SELECTED="selected">(B) No <OPTION>(C) Yes </SELECT> <H3>Multiline Text Area, with default text</H3> <TEXTAREA NAME="Area1" COLS="50" ROWS="10"> Default Text </TEXTAREA> </FORM> </BODY> </HTML>

#23: Friday, October 28, 2011 – Forms and Buttons, JavaScript in browser. Forms may also

communicate directly with JavaScript functions in the current page, and not require a CGI script on a server. In the following Web page there are a number of input text boxes, an output text box, and a number of buttons. Each button “calls” a JavaScript function in its onClick event handler. For example, the Add button contains onClick="Add();" which means that the Add() function defined in the <SCRIPT> area will be called when the button is clicked. The Add() function calls the N1() and N2() functions, each of which extracts text from one of the input boxes and returns the number corresponding to the text typed into that box (the built-in parseFloat function takes in a string of characters and returns the number corresponding to those characters). The Add() function then adds the two numbers together, converts the result back to a string of characters (using the String() function), and puts that string into the Answer text box.

Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 24

Notice how the N1() function gets its result from document.MyForm1.Input1.value: it reaches into the current document, finds the form named MyForm1, finds in that form the input object named Input1, and extracts its value. That value is converted from a string to a number and is returned as the result of the function. Similarly, the Add() function reaches into the current document, finds the form named MyForm1, finds in that form the input object named Answer, and sets its value to the string generated by the sum of N1() and N2().

<HTML> <HEAD> <TITLE>Test of HTML Forms</TITLE> <SCRIPT TYPE="text/javascript"> <!-- function N1 () {return parseFloat(document.MyForm1.Input1.value) ;} function N2 () {return parseFloat(document.MyForm1.Input2.value) ;} function N3 () {return parseFloat(document.MyForm1.Input3.value) ;} function N4 () {return parseFloat(document.MyForm1.Input4.value) ;} function Add () {document.MyForm1.Answer.value = String(N1()+N2()) ;} function Subtract() {document.MyForm1.Answer.value = String(N1()-N2()) ;} function Multiply() {document.MyForm1.Answer.value = String(N1()*N2()) ;} function Divide () {document.MyForm1.Answer.value = String(N1()/N2()) ;} function Distance (x1,y1,x2,y2) { return Math.sqrt((x2 - x1)*(x2 - x1) + (y2 - y1)*(x2 - x1)) ; } function GetDistance () { document.MyForm1.Answer.value = String(Distance(N1(),N2(),N3(),N4())) ; } //--> </SCRIPT> </HEAD> <BODY BGCOLOR="#00FFFF"> <FORM NAME="MyForm1"> <H2>Simple Calculator Using JavaScript</H2> <H3>Reset</H3> <INPUT TYPE="reset" VALUE="Reset All"><BR> <H3>Numeric Inputs</H3> <INPUT TYPE="text" NAME="Input1">Input 1<BR> <INPUT TYPE="text" NAME="Input2">Input 2<BR> <INPUT TYPE="text" NAME="Input3">Input 3<BR> <INPUT TYPE="text" NAME="Input4">Input 4<BR> <H3>Commands</H3> <INPUT TYPE="button" VALUE="Input1 + Input2" onClick="Add();"><BR> <INPUT TYPE="button" VALUE="Input1 - Input2" onClick="Subtract();"><BR> <INPUT TYPE="button" VALUE="Input1 * Input2" onClick="Multiply();"><BR> <INPUT TYPE="button" VALUE="Input1 / Input2" onClick="Divide();"><BR> <INPUT TYPE="button" VALUE="Distance: <Input1,Input2> to <Input3,Input4>" onClick="GetDistance();"><BR> <H3>The Answer</H3> <INPUT TYPE="text" NAME="Answer"><BR> </FORM> </BODY> </HTML>

JavaScript functions are free-format. For example, the Add() function could have been: function Add() { document.MyForm1.Answer.value = String( N1()+N2() ) ; }

Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 25

#24: Monday, October 31, 2011 – Snow Day. Nor’easter dumps 10 inches of wet heavy snow over the weekend, breaking trees throughout region. Power takes days to be restored.

#25: Wednesday, November 2, 2011 – Handed back midterm exam (count = 103, average = 60.3,

minimum score = 26, maximum score = 99). Discussion of infrastructure issues concordant with the storm on Saturday, October 29, that knocked out power, telephones, and Internet access to most of Western New England.

#26: Friday, November 4, 2011 – In-class quiz given by TA. #27: Monday, November 7, 2011 – Telnet. Traditional telnet is a program for connecting over the

Internet (not the Web) to a remote computer for the purpose of issuing it commands. The original raw telnet was not encrypted, exposing usernames and passwords (and everything else) to packet sniffers running on machines intermediate between user’s terminal and remote machine being connected to. Modern programs encrypt all communications so that packet sniffers cannot make sense of what they intercept. On Macs: open Finder-Applications-Utilities-Terminal to launch a UNIX command shell on the Mac itself (modern Macs run UNIX), then from there issue an ssh command to connect to a remote server. On PCs there is no facility available by default, but Windows users can download a program called PuTTY from http://www.chiark.greenend.org.uk/~sgtatham/putty/ which provides an encrypted telnet (and which requires no installation – just drop the program on the desktop). We have a server dedicated to my classes called elsrv3.cs.umass.edu, located somewhere in the CMPSCI building. All students have a username and password on this server. The username should be the same as a student’s UMail username. The initial password is of the form ELxxxaaa, where xxx is the last three letters of the SPIRE ID number and aaa is the first three letters of the UMail username. For example, Fred Q. Smith with SPIRE ID 12345678 will have username fqsmith and password EL678fqs. Passwords do not echo to the screen.

To connect via Mac: 1. Open Finder-Applications-Utilities-Terminal 2. Type ssh [email protected] with your UMail username in the blank. 3. When the server responds asking for password, type in your default (ELxxxaaa).

To connect via PuTTY on a Windows system: 1. Launch PuTTY. 2. In the Host Name box enter elsrv3.cs.umass.edu, and make sure Connection Type is ssh. 3. When the server responds asking for username, type it in. 4. What the server asks for password, type it in (initially the ELxxxaaa format).

When first connected to the server, the first action will be to change the default password. The process requires that the original password be re-entered (ELxxxaaa), and then the new password is to be entered twice. I cannot recover a lost or forgotten password, but I can reset it to the default. Once in to the server, everything is a list of UNIX commands, and the first bunch is to set up the UNIX nest for Web pages: ls -al to list files, mkdir public_html to create the default Web nest, chmod a+rx public_html to allow access to the nest from outside, and chmod a+rx . (don’t forget the dot) to allow access into the account from outside. Type logout to break the connection to the UNIX server. Mac users must then type logout again to close their local Terminal session.

Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 26

#28: Wednesday, November 9, 2011 – UNIX. Getting in, changing directories and permissions. Once logged in, change into the public_html folder with cd public_html (the cd command means “change directory” and is always cd _____ with either a folder name in the blank or .. to close the current folder).

File permissions are always of the form rwxrwxrwx, where the individual rwx triplets represent, respectively, the user (you!), the group (the group of related accounts the user belongs to), and others (everyone else). The r means read permission (the file contents can be examined), w means write permission (the file can be changed or deleted), and x means execute permission (if a program the file can be run, and if a folder the file can be opened). Changing permissions is through the chmod (“change mode”) command, and always is of the form chmod _____ filename where the blank contains directives on how the permissions are to be changed. These directives may be relative or absolute.

Relative: u, g, or o (user, group, others), + or – (add or take away permission), r, w, or x (read, write, or execute). For example, to add read and execute permission for group and others, the pattern would be go+rx (group-others-add-read-execute). The pattern ugo may be abbreviated as a, and multiple patterns may be separated by commas. For example, to add read and execute permission to user, group, and others, but take away write permission for group and others on a file called public_html, the command would be chmod ugo+rx,go-w public_html, or alternately it could be specified as chmod a+rx,go-w public_html.

Absolute: treat each rwx triplet as a binary number, where a letter (presence of the permission) is indicated by a 1 and a dash (absence of the permission) is indicated by a 0. Convert each triplet to decimal (octal, actually). For example, the pattern r-x would be thought of as the binary number 101, which has the value 5 (1×22 + 0×21 + 1×20 = 1×4 +0×2 +1×1 = 4+0+1 = 5). Here are the patterns:

rwx = 111 = 7 (common for personal folders) rw- = 110 = 6 (common for personal files) r-x = 101 = 5 (common for public folders) r-- = 100 = 4 (common for public files) -wx = 011 = 3 (rare) -w- = 010 = 2 (rare) --x = 001 = 1 (rare) --- = 000 = 0 (common for private files)

Thus, to set the permissions on public_html to rwxr-xr-x, the UNIX command to type in is chmod 755 public_html.

#29: Monday, November 14, 2011 – All about FTP. FTP stands for File Transfer Protocol, and is used for

moving files from one computer to another over the Internet. Like Telnet, FTP came out of a time when traditional tools were not encrypted. Today, we use encrypted versions. There are many tools available, such as WinSCP for Windows and Fugu for the Mac, both of which require an installation process (and there is a beta version of Fugu for Mac Lion). Text files on the PC contain both a carriage-return (CR) and a line-feed (LF) character at the end of each line of text (dating from a time when printing terminals needed to return the print carriage to the left and feed the paper up one line). Old-style Macs just used the CR character to separate lines, and UNIX (including modern Macs) use just the LF to separate lines. FTP can convert text files to the proper form for the receiving system. Copying a file from Windows to UNIX in text mode strips

Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 27

out CR characters, making files smaller. Copying a file from UNIX to Windows puts them back in, making files larger. Doing this on non-text files results in garbage on the receiving system, however. This requires that files be transferred with no alterations, in binary mode. It is possible to use WinSCP or Fugu to do all common file management tasks (setting permissions, changing names, deleting files, etc.), instead of using Telnet to connect to the server and then typing in UNIX commands.

To edit a Web page, there are three basic approaches: 1. Use WinSCP or Fugu to FTP-copy a file to the local PC/Mac, edit the file locally in

Notepad or Text Editor, then use WinSCP or Fugu to FTP-copy the file back to the server. 2. Use WinSCP to “edit the file directly”: the WinSCP program has a built-in editor, and

editing a remote file causes WinSCP to automatically FTP-copy the file to the PC before editing and FTP-copy the file back to the server when complete, hiding the file copying.

3. Telnet to the server with ssh or PuTTY, then use the UNIX editor emacs directly on the server to change the files.

#30: Monday, November 14, 2011 – More on FTP. Demo of ExpressPCB as example of embedded FTP.

ExpressPCB is a program for designing printed circuit boards on a local PC. When the design is complete, the program will securely FTP the design and the credit card information to a remote server, the company builds the circuit board according to the design, and then sends the complete boards via FedEx to the user within three days. This is a program in which FTP is a critical component, but it is not the program’s primary function. Modern FTP programs always have the local PC/Mac as one end of the transfer, and the other to a remote UNIX machine. The alternative is to Telnet to a UNIX machine, and then FTP to a second remote machine (example using a server in Finland at garbo.uwasa.fi, requiring an anonymous log-in because I do not have an account at the machine in Finland).

#31: Wednesday, November 16, 2011 – Intro to Python as responder to Web forms. The following code

is a Python program called echohtml.cgi that runs on the UNIX server, as a responder to Web forms (see lecture of October 26TH)

#!/usr/bin/python import cgi form = cgi.FieldStorage() print "Content-type: text/html\n" print "" print "<HTML>" print " <HEAD>" print " <TITLE>Received Data</TITLE>" print " </HEAD>" print "" print " <BODY>" print " <OL>" for key in form.keys(): val = form[key].value print " <LI>", key, " = ", val print " </OL>" print " </BODY>" print "</HTML>"

Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 28

The first line of the Python program (#!/usr/bin/python) is a directive to the UNIX system of where on the disk to find the appropriate interpreter to run the program. This is required for all Python programs that run on UNIX systems.

The next two lines (import cgi and form = cgi.FieldStorage()) are the interface to the Web data being received. Python programs have a lot of code already created and written by other people that “do things” that we don’t know how to do. This other code is stored in libraries which must be imported; the import cgi line says that we will need code to handle CGI scripting tasks and must import that code from a library called cgi into our program. The form = cgi.FieldStorage() line says to call a function called FieldStorage from the cgi library and assign its results to a local variable called form. The variable form contains all the information from the Web form as key-value pairs (for example, on an HTML page a checkbox named Check3 containing a checkmark will be received from the submitting Web form as Check3=checked, where Check3 is the key and checked is the value). So, variable form contains all the data submitted to the CGI script from the Web form as a list of key-value pairs. Most of the rest of the CGI script generates HTML code on-the-fly as Python print statements. The “heavy lifting” is from the section of code that says:

for key in form.keys(): val = form[key].value print " <LI>", key, " = ", val

This code steps through the list of keys in the variable form (form.keys()), assigning each in turn to variable key for the body of the loop. For each value of key, variable val is assigned from the form variable the corresponding value. The print statement prints out the key-value pair as a member of an HTML list. The important lesson about this simple program is that it generates HTML code on-the-fly – the HTML Web page is dynamic, not static. This is characteristic of CGI scripts. Depending on what data they receive, they will generate an HTML response that is appropriate for the data, and not the same page every time. The CGI script is a text file, but because it is also a program it must have execute permission in order to run. The permissions on this file should be rwxr-xr-x (everybody can look at the contents and run it as a program, but nobody can change the file except the user).

#32: Friday, November 18, 2011 – Python on server. Here is a simple Python program that asks the user

for a number and prints it:

#!/usr/bin/python N = input("Enter a number --- ") print N

The first line is the “magic incantation” that starts the Python interpreter. The second line asks for a number and assigns the value to variable N. The third line prints the value of N. This file

Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 29

can be entered via emacs directly on the server. Its permissions must be set to rwxr-xr-x for it to run, and the program can be run by typing its name (test.py in this case) directly at the UNIX command line. (Note that creating the file on a PC using Notepad will work, but the file must be transferred to UNIX by WinSCP in text mode and not binary mode – one student found out the hard way that the file looks OK in binary mode on the UNIX side but will not run unless the Python interpreter is manually started, as in python test.py instead of just test.py by itself. The extra CR characters in the file will prevent the program from running.)

Here is a factorial program in JavaScript, using the same techniques as those already shown. <HTML> <HEAD> <TITLE>Test of Factorial in JavaScript</TITLE> <SCRIPT TYPE="text/javascript"> <!-- function Factorial (N) { var F = 1 ; var I = 1 ; while (I <= N): { F = F * I ; I = I + 1 ; } return (F) ; } function Interactive () { var N = parseFloat(document.MyForm.Input1.value) ; document.MyForm.Answer.value = String(Factorial(N)) ; } //--> </SCRIPT> </HEAD> <BODY BGCOLOR="cyan"> <FORM NAME="MyForm"> <INPUT TYPE="text" NAME="Input1"><BR> <INPUT TYPE="text" NAME="Answer"><BR> <INPUT TYPE="button" VALUE="Compute!" onClick="Interactive();"> </FORM> </BODY> </HTML>

The Factorial function is “clean” in the sense that it only computes factorials; the interface between it and the HTML form is through the Interactive function, called in the onClick event handler of the button. As you can see, the code is lengthy and complicated, as it needs to deal with HTML, forms, and JavaScript. Unlike JavaScript, Python is not free-format. Indentation matters.

Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 30

Here is the source code for the equivalent Factorial.py program: #!/usr/bin/python def Factorial(N): F = 1 I = 1 while (I <= N): F = F * I I = I + 1 return F def Interactive(): N = input("Enter a number ") print Factorial(N) Interactive()

The body of the functions and of the while-loop are indented relative to the rest of the code. As a result, programs in Python tend to contain more, simpler statements than JavaScript programs (you cannot pack the code as you can in JavaScript), but each line tends to be cleaner (fewer odd characters such as curly-braces and semicolons, but note the required use of the colon at the end of def and while statements). In Python you do not need to connect the input-output to HTML objects such as text boxes.

A simpler version of the Factorial program does not require functions, as in: #!/usr/bin/python N = input("Enter a number ") F = 1 I = 1 while (I <= N): F = F * I I = I + 1 print F

(Factorial.py versus JavaScript version). Demo of Lab #5 (factorial that sends email), plus exhortation to not misuse. The current assignment extends Factorial to automatically email its answer to a central mail drop. Here is the complete program for that assignment (replace my name and email address, underlined in the assignment, with yours). #!/usr/bin/python import smtplib N = input("Enter a number") F = 1 I = 1 while (I <= N): F = F * I I = I + 1

Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 31

From = "Bill Verts <[email protected]>" To = "[email protected]" Subject = "Factorial of " + str(N) + " from Bill Verts" Text = "Factorial of " + str(N) + " is " + str(F) Message = "From: " + From + "\r\n" + \ "To: " + To + "\r\n" + \ "Subject: " + Subject + "\r\n" + \ Text print "Sending: " + Message Server = smtplib.SMTP("localhost") try: Code = Server.sendmail(From, [To], Message) finally: Server.quit() if Code: print "Error sending email" else: print "Email sent successfully"

The only new line at the beginning of the program is the import smtplib line, which accesses a library of routines for handling email. Many of the intermediate lines do nothing more than build up a set of strings into named variables. Those strings look like (and are) the requirements for an email message, including the From: line, the To: line, the Subject: line, etc. The final string is built into variable Message, which concatenates (glues together) the individual strings, but also inserts CR and LF characters (the \r and \n items, respectively) to separate the lines. The “heavy lifting” is done by three of the remaining lines: Server = smtplib.SMTP("localhost") Code = Server.sendmail(From, [To], Message) Server.quit()

These lines could be placed one after another as you see here, but only as long as everything works correctly. The first creates a local “server” object, the second actually sends the email, and the third closes down the server correctly. By wrapping the email-send code in a try-finally block, it is guaranteed that even if the email-send fails, that everything is shut down properly (even if an error occurs in the try section, the code in the finally section will run). Finally, the if Code: line uses the result of sending the email to inform the user if everything worked properly. It should be pretty obvious that sending email in this fashion is an extremely powerful tool, and it can be easily misused. It is trivial to “pretend” to be someone you are not, and it is easy to send a spam message to a million addresses. DO NOT MISUSE THIS CODE. You are representatives of the University: If I catch anyone using this code for spamming, they will automatically fail the course. If anyone uses this code to harass other people, or to communicate with anyone under false pretenses, I will turn them in to the police. DON’T SCREW THIS UP.

Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 32

#33: Monday, November 21, 2011 – In the Factorial program written in JavaScript, the largest input number that returns an integer result is 21 (21! = 51090942171709440000). Larger numbers return values in scientific notation (22! = 1.124×1021), but only up to a limit of 170 (170! = 7.25×10306). In JavaScript, the factorial of any numbers greater than 170 return the value “infinity” as their result. This is because all numbers in JavaScript are double-precision floating-point numbers (even those shown as integers), which have a dynamic range of approximately 10±308 (and only about 15 significant figures).

In contrast, integers in Python are a separate data type from floating-point numbers, and unlike double-precision floating-point numbers integers have arbitrary precision. The size of integers are limited only by the available memory, at a commensurate reduction in processing speed (the larger the integer, the longer it takes to compute with it). Here is our first Python factorial program again: #!/usr/bin/python def Factorial(N): F = 1 I = 1 while (I <= N): F = F * I I = I + 1 return F def Interactive(): N = input("Enter a number") print Factorial(N) Interactive()

With this program it is possible to compute factorials of any size; the factorial of 200 is 788657867364790503552363213932185062295135977687173263294742533244359449963403342920304284011984623904177212138919638830257642790242637105061926624952829931113462857270763317237396988943922445621451664240254033291864131227428294853277524242407573903240321257405579568660226031904170324062351700858796178922222789623703897374720000000000000000000000000000000000000000000000000 (exactly).

We get this behavior because all variables (N, F, I) and constants (1) in the program are integers. Changing just one to floating-point is enough to force all dependent calculations to be done in floating-point as well. For example changing the single line F = 1 to F = 1.0 will do the job, as the dependent line F = F * I multiplies a floating-point number by an integer, forcing the calculation to be done in floating-point. If we do this, the limit of the calculations is now the same as in the JavaScript version, and we cannot compute any factorials larger than 170! (both Python and JavaScript use double-precision floating-point numbers).

We must be very careful about integers and floating-point numbers in Python, particularly with division. Dividing 6/2 gives 3 as you would expect, but dividing 6/4 gives 1 and not 1.5 because the 6 and 4 are both integers. To get the fraction, one of the source numbers must be floating-point. This can be done as 6.0/4, 6/4.0, or 6.0/4.0 – in each case at least one number is floating-point, which forces all integers in the calculation to be converted to floating-point before the division. Similarly, 1+5=6 (everything is integer), but 1+5.0=6.0 (the 1 is converted to 1.0).

Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 33

#34: Wednesday, November 23, 2011 – All about encryption. Single-key encryption (the same key is

used for both encryption and decryption) can be good for large messages, but both sender and receiver need to have the same key. Code books during WWII say what key to use at particular time, and became highly protected resources and highly prized by the other side. Here are some examples of single-key techniques:

1. Caesar cipher: rotate the alphabet by a fixed amount (rotate by 5 has A F, B G, C

H, …, X C, Y D, Z E). The encryption key is the rotation factor; both sender and receiver need to know by how much the alphabet was rotated. This form of encryption is easy to break by brute force – simply try all 25 possible rotations and see which one makes sense.

2. Permutation cipher: scramble the letters (A Q, B E, C R, D M, …). The

encryption key is the number of the permutation used, from 26! = 403291461126605635584000000 possibilities. Harder to break by brute force, but can be broken by applying statistical means for the target language (most letters in English generally follow the frequency pattern E, T, A, O, I, N, …).

3. XOR (exclusive-OR) cipher. Generate a random sequence of bits, and whenever the

random bit is 1 flip the corresponding bit from the message from 0 to 1 or from 1 to 0, but whenever the random bit is 0 leave the corresponding message bit alone. This lays a statistical “noise field” over top of the message. Bits from a computer random-number generator are not truly random (statistically they appear to be random), but instead form a pseudorandom sequence depending on a starting seed (the sequence is repeatable). The encryption key is the starting seed for the random-number generator; both sender and receiver have to know the seed.

In double-key encryption (also known as public key encryption) the key is formed in two pieces simultaneously. The math technique is to pick two very large prime numbers and multiply them together; asking if a large number is prime, divisible only by 1 and itself, is much easier than factoring numbers into their components. The two pieces of the key are based on the two prime numbers and their product. Knowing one key does not mean that you can figure out the other key unless you can factor the product back into the two primes; when it becomes practical to factor numbers of a certain size, you need only pick larger numbers and the method is secure once again. Remember that the two pieces of the key are generated simultaneously. One is made public and the other is kept private. Keys are symmetric: you can encrypt with either one, and then decrypt with the other. To send a message to someone, use their public key for encryption; they will use their private key to decrypt. For someone to send a message to you, they will encrypt the message with your public key. Only you can decrypt the message because only you have the private key that matches. To sign a message, use your private key to encrypt the message; because everyone can decrypt the message with your public key, they know the message had to come from you. Generally, messages are encrypted twice: first with the sender’s private key (to sign it), and then with receiver’s public key (to encrypt it). Only the receiver can get through the outer layer with

Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 34

their own private key, and then they use the sender’s public key to verify who it came from. Messages sent this way are small numbers – but these numbers can then be used in large-scale single-key encryption for big messages. When browsers connect to secure sites, browser and server first exchange public keys, then use public key encryption to exchange keys for single key encryption, and finally use single key encryption to communicate the ordering information (names, addresses, credit card numbers, etc.). Steganography is not encryption, but a way to hide information in plain site (“photographic microdot” of old WWII movies). To do this digitally, you can for example hide each bit of the message in the low-order bit of the blue value of pixels in an image – doing so doesn’t change the picture in a noticeable way. Alternately, you can hide each bit of the message in the low-order bit of each sample in an audio file – this doesn’t change the sounds in a noticeable way. The message can be encrypted first, but doesn’t have to be.

#35: Monday, November 28, 2011 – In-class quiz #2. Here’s the quiz and its solution:

<1> 8 Points – In Python, the following variables X, Y, and Z have the values shown:

X = 9 Y = 5 Z = 4.5

What are the values printed out by the following Python statements?

A. print X+Y 14 B. print X-Y 4 C. print X*Y 45 D. print X/Y 1 (integer division) E. print X+Z 13.5 F. print X-Z 4.5 G. print X*Z 40.5 H. print X/Z 2.0 (floating point division) <2> 4 Points – How many lines are being printed out by each of the following two sections of Python code? How

does the indentation of the print statement affect your answers?

I = 0 I = 0 while (I < 10): while (I < 10): I = I + 1 I = I + 1 print I print I

The code fragment on the left prints ten separate lines (with the value of I ranging from 1 through 10), but the right-hand code fragment prints only one line (the value of I is 10 at that point). The indentation is the only difference between the two, where on the left the print statement is indented to be inside the loop along with the I = I + 1 statement, and on the right the print statement is outside the loop and is executed only when the loop has completely run its course.

<3> 8 Points – Complete the following Python code to create a Web page containing all numbers from 1 through

1000 and their squares, each in its own <LI>…</LI> tag as part of the unordered list:

#!/usr/bin/python

print "<HTML>" print " <HEAD>" print " <TITLE>My Spiffy Web Page</TITLE>" print " <HEAD>"

Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 35

print "" print " <BODY>" print " <UL>" I = 1 while ( I <= 1000 ): print "<LI>", I, " squared is ", I*I, "</LI>" I = I + 1 print " </UL>" print " </BODY>" print "</HTML>"

Mini-Topic: FavIcons (“favorite icons”) are the little custom pictures at the corners of Web pages. Here is my own home page, showing the icon associated with that page in Firefox next to the URL and in the tab.

These images are currently 16×16 pixels, with only 16 colors allowed. To edit one of these images, you need a special program (Windows Paint and Mac PaintBrush won’t work). There are a number of suitable programs on the Web, including my own icon editor suite. The suite contains twelve versions of the program, one for each of the common sizes (16×16, 32×32, 48×48, 64×64) and color depths (2, 16, 256 colors) of icon files currently in use today. The version suitable for FavIcons (16×16, 16 color) is shown below, with the icon from my Web page loaded:

The left side is the editor panel, where clicking on a pixel changes it to the selected color in the tools panel on the right. Regions may be flood-filled, horizontal and vertical lines may be drawn across the image, and pixels may be set to transparent. Infinite “undo” is supported. The icon is also shown in its actual size in the tool panel.

Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 36

Loading a FavIcon into a Web page requires that two links be added to the HTML in the <HEAD>…</HEAD> section:

<LINK REL="shortcut icon" HREF="http://_______/favicon.ico" /> <LINK REL="apple-touch-icon" HREF="http://_______/favicon.ico" />

The differences between the two links are due to browser differences; in some cases the favicon.ico file must be at the root directory (inside public_html), but for many browsers the file can be in other folders. Browsers also tend to balk at showing the FavIcon unless the cache has been cleared and the browser re-started (and sometimes not even then).

#36: Wednesday, November 30, 2011 – Viruses and other Malware. A virus is a program written by

people to intentionally cause disruption to others’ computer systems. Historically, viruses come in several major epochs:

1. File-Infector Viruses: programs that attach themselves to the end of MS-DOS .EXE

programs (making them larger on the disk), patching a “jump” instruction to jump to their own code. When the program is given to someone else (on a floppy disk) and run, the virus code runs, infecting the new system. These are the easiest to detect, and the easiest to kill: the virus has a unique “signature” that identifies it to anti-virus programs. Common from mid 1980s through mid 1990s.

2. Boot-Sector Viruses: programs that insert themselves in track-zero-sector-zero of disks

(floppy and hard). On old MS-DOS systems, a floppy disk containing the MS-DOS operating system is inserted into disk A: before power-up (on systems with hard disks, the operating system is on disk C:), and the permanent code in ROM knows to look in track-zero-sector-zero for the operating system startup code. By replacing the code in track-zero-sector-zero with the malware code, that code is automatically run when the computer is turned on. “Well behaved” viruses move the startup code to somewhere else on the disk and run it when they are done installing themselves, but other viruses were not so nice. Easy to detect, but harder to kill: sometimes the disk needs to be reformatted. Common from late 1980s through late 1990s.

3. Macro Viruses: Macros are programs written in a special programming language

associated with an application program such as Microsoft Word or Excel; they automate complicated or frequently executed tasks. In creating Word and Excel for both PCs and Macs, Microsoft made certain that the macros executed the same way on both platforms. Thus, a macro program created to do harm will work the same way on both PCs and Macs. Some graphical email programs from the late 1990s were designed to detect the presence of an attachment and automatically load the attachment into its associated application program. If someone was sent an Excel file attachment, the email program would automatically load the file into Excel. Unfortunately, this means that the mere receipt of an email with a Word/Excel document containing a macro virus would be enough to infect the receiving system, be it a PC or Macintosh. This hole was quickly patched, and now email programs no longer auto-load attachments into their applications. Email programs now also automatically scan attachments with an antivirus scanner. Common from late 1990s through mid 2000s.

Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 37

4. Buggy Email Exploits: Similarly, malware programs exploit bugs in email programs to gain control of their hosts. Microsoft Outlook Express has been a common vector. The macro virus called Melissa in the late 1990s took advantage of bugs in an email program to send itself to the first 50 addresses in the user’s address book; other viruses send themselves to everybody in the list. Common from late 1990s through today.

5. Social-Networking Exploits: While many programs (email programs, browsers running

JavaScript, Java applets, etc.) still contain bugs that can be used to take over the system, it is harder and harder for malware to get a foothold in a system by themselves. They need help, and often get that help directly from the user of the system through what is called “phishing”. These are often done through emails that are Web pages where the “click here” links to a page containing the malware code configured to take over the system, or where the “click here” links to a page where the user is fooled into entering private and personal information (name, bank account numbers, social security numbers, credit card numbers, etc.). Another variant is the Nigerian Spam Letter (not always from Nigeria) which is a money-laundering scam that promises millions of dollars for temporary use of someone’s bank account. Common long ago, still common today.

An old Nightline video shows the panic over the Michelangelo virus in March of 1992, back when the Web was in its infancy, Microsoft Windows 3.1 and MS-DOS were the common operating systems of the day, and few people knew how to handle malware. At the time, there were about 1000 different viruses; many more are known today. Virus “payloads” can be relatively benign (locking the keyboard) or extremely destructive (deleting all files). In November of 1988, Robert Morris, Jr., released a program now known as the Internet Worm (not a virus). It exploited (known) bugs in the UNIX sendmail program to send itself to other computers. Unfortunately, once each new system was infected, it kept replicating in that system so that all computer resources were used to run the program and nothing else. The Internet (still pre-Web) was essentially unusable for several days. Morris received probation and a fine. A few years earlier, astronomer Clifford Stoll at Berkeley was tracking a hacker from Germany who was using the Internet to break into computers in the U.S. looking for secrets to sell to the KGB. His exploits are documented in a 1989 book called “The Cuckoo’s Egg,” which reads like a real-life spy novel (as written by a hippie). The appendix to that book recounts how the Internet Worm worked, and how, because of his experiences tracking the hacker, Stoll for a time was a suspect in its creation.

Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 38

#37: Friday, December 2, 2011 – What actions can be taken to avoid malware and phishing attacks?

1. Run antivirus programs. Well known programs include McAfee AntiVirus, Norton AntiVirus, Avast, SpyWare Doctor, etc. Some programs are free, including Ad-Aware, Spybot Search and Destroy, Malware Bytes, etc. Most are for the PC, but there are programs for the Mac as well (including iAntiVirus).

2. Run system “de-gunker” system cleanup programs. Not anti-virus programs, per se, but

programs to delete files no longer needed. These include the contents of the trash/recycling bin, temporary files (both system and browser), cookies (including tracking cookies), leftover software from installing new programs, etc. One of the best known for the PC is CCleaner (crap cleaner). Often these programs have “secure delete” features to thoroughly erase files on the disk by writing random data over them, instead of simply “forgetting” where the file is located.

3. Clear cookies, cache, and history in your browser (and empty the trash). In a browser,

the cache is where downloaded files are stored in case they are needed later – rather than download a file again, the browser can pull it from its cache much more quickly. Unfortunately, as more and more files are stored in the cache the worse the system performance will get, and there is always the possibility that an “embarrassing” file is stored there. Clearing the cache on a regular basis keeps system performance up, gets rid of “stale” copies of a frequently changing Web page, and gets rid of problematic files. Similarly, cookies have legitimate uses so that Web sites can track a user’s preferences, but “tracking cookies” can share browsing habits among different sites. Clearing cookies regularly reduces problems with privacy, as does clearing the browser history. Many browsers have a “private browsing” mode that clears these items automatically.

4. Never click a “click here” button. No legitimate company will send emails saying

“there’s a problem with your account; click here to visit our site.” Instead, all reputable companies will say “there’s a problem with your account; please go visit our site.” Never enter personal or private information into a Web page unless you manually activated that page AND it is a secure page. Never put passwords, ID numbers, credit card numbers, etc., in emails – they are NOT typically encrypted. If you receive a suspect email configured as a Web page (common these days), some email programs let you see the underlying HTML code – read the code to see if links are to where they should (i.e., if the Web page shows a link to eBay, the underlying code should also show a link to <A HREF="http://www.ebay.com/"> and not to some bogus site such as <A HREF="http://www.hotsex.com/">, or worse, to an anonymous IP address).

Please make certain that your beloved and aged friends and relatives practice these tips as well. People who are largely unfamiliar with social engineering exploits are commonly fooled by “please click here” scams.

Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 39

Mini-Topic: On-line polls. Everyone familiar with the Web has seen polls, but most people do not realize how worthless they are. Many polls are badly designed. For example, a poll may have answers Yes-No-Other (where Other is irrelevant), Yes-Yes-No or Yes-No-No (where one vote is split among two alternatives or where two questions are being asked in one poll), or may have answers that do not cover all possibilities. Polls that are open to everyone can be crashed by concerted efforts of a large number of people (look up the phrase: “to Pharyngulate a poll”). Polls that are not open, but instead require a registration log-in, often are targeted to a specific group of people with similar interests. In those cases, the poll answers will tend to agree with the attitudes of those registrants only. Here are a couple of examples of bad polls from http://fails.failblog.org/:

#38: Monday, December 5, 2011 – This day was mostly devoted to covering the requirements of

assignment 6 and extra-credit assignment N. Assignment 6 uses a Web page in HTML to send <FORM> data to a server-based Python script (provided, but requiring modifications). Assignment N requires the creation of a client-side image map on a transparent .GIF image. While Windows Paint and Mac PaintBrush can be used to create the basic image and get most of the coordinates of the objects on screen, they cannot be used to generate transparent .GIF images. My own Bézier Madness program allows users to create and edit drawings, save those drawings as transparent .GIF images, extract the coordinates of the objects (either from the saved .BEZ description file or by creating a .SVG image and examining its contents). The extra-extra credit portion of assignment N is to create a FavIcon (using my icon editor suite or an equivalent program).

#39: Wednesday, December 7, 2011 – This was a review of the big themes in the course. In this

semester we covered essentially five different “languages” (some true computer languages, others languages in name only). Those languages include HTML, CSS, JavaScript, UNIX, and Python. The items on the client side include HTML, CSS, and JavaScript. In HTML, we covered an amalgam of HTML 3 and HTML 4, noting several common tags and attributes now considered deprecated (<CENTER>, BGCOLOR, etc.), but the Web is slowly migrating towards HTML 5. In CSS we covered just the basic ideas of CSS 1 (i.e., the “cascade”), but CSS 2 is available now and use of CSS 3 is growing. JavaScript is used in the browser to create HTML content on the fly (by writing HTML code directly into the current document) and to permit interaction with <FORM> data.

Fall 2011 Lectures in CMPSCI 120 ©Dr. William T. Verts Page 40

On the server side, we covered the basics of UNIX (the file structure, permissions, directories, and how to run programs), and the Python programming language. UNIX (and Linux) is commonly the operating system that runs Web servers today. Python is a general-purpose programming language, with libraries that can do many things (such as send emails), but it is also used to respond to Web pages sending <FORM> data by dynamically generating HTML responses. Python is not the only language pressed into service in these roles: Perl and PHP are two such, along with server-side JavaScript and many others.

#40: Friday, December 9, 2011 – Return Quiz 2, go over quiz answers, course evaluations. In preparation

for the final exam, it should be emphasized that there will be questions on programming in JavaScript and Python, as well as HTML and CSS (of course). There will be at least one question that asks “What is written into the current document by the JavaScript code?” (similar to the midterm and quizzes) as opposed to “What shows up on screen?”. I encourage students to download and print out the sample code from the class site (HTML pages, JavaScript samples, etc.).