Agenda
• Questions
• Unix Survival Guide
• Document Creation (Word Processing and HTML)
• Document Retrieval
• Project Overview
Unix Survival Guide
• WAM account
• Directory structure (mkdir, cd, .., /)
• How much space is used (du, ls -l)
• Eliminating unneeded files (rm)
• Managing mail (pine, attachments)
• Moving files (mv, cp, ftp)
• Editing files (pico, more)
• Web anywhere (lynx)
Document Creation
• Editors
• Word Processors
• Desktop Publishing
• Structured Documents
• HTML/SGML/XML
Editors(Text Editing vs. Word Processing)
• Purpose– Create and modify ASCII text
• Examples– pico, axe, and emacs on WAM
• Advantages– Compatible with virtually everything (VT-100)
• Disadvantages– Limited format control, sometimes no mouse
Word Processors
• Purpose– Create documents intended for human readers
• Examples– Microsoft Word and Word Perfect in OWL
• Advantages– Good format control– WYSIWYG (“What You See is What You
Get”)
• Disadvantages– No (universal) standard interchange format
Desktop Publishing
• Purpose– Produce documents for wide (paper) distribution
• Examples– Adobe Pagemaker in the WAM labs
• Advantages– Allows very detailed layout control
• Disadvantages– Requires fairly extensive user expertise
Structured Documents
• Purpose– Specify logical structure of the documents
• Examples– email, HTML, LaTeX, SGML/XML
• Advantages– Allows easy reformatting for different displays
• Disadvantages– Hard to read unless “rendered” before viewing
Hyper-Text Markup Language (HTML)
• Purpose– Structured document language for web pages
• Advantages– Adapts easily to different display capabilities– Widely available rendering software (browsers)
• Disadvantages– Direct control over layout is limited– The HTML “standard” is still evolving
First Steps in HTML
• Find a web page you like
• Select “Document Source” in “View” menu
• Compare HTML code with rendered version– Observe how to achieve each effect
• Select “Save As” in “File” menu
• FTP the file to ~/../pub/ on WAM
• Edit the file using pico
• http://www.wam.umd.edu/~userid/filename
HTML Document Structure• Markup tags (open and close) bracket content
<tag> … </tag>
• Title shows up in the Web browser’s frame
• Headers show up in the page itself
• For each link, specify the URL and link text<a href=“URL”>link text</a>
• Inline graphics can replace the link text<img src=oard.jpg>
Designing Web Pages
• Key design issues:– Content: What do you want to publish?– Style: How do you want to present it?– Syntax: How can you achieve that
presentation?
• Sources of information– Online tutorials (Yahoo points to lots of these)– Technical materials (e.g., the HTML 3.0 spec)
Style Guidelines
• Design for generic browsers– And test on every version you wish to support
• Provide appropriate access points– User needs and navigation strategies differ
• Design useful navigational aids– A web search may lead to the middle of a site
• Include some indication of currency– Date of last update, “new” icons, etc.
HTML Editors• Goal is to create web pages, not learn HTML!
• Several are available– In Explorer, “Edit-Page” for Front Page Express– In Netscape, “File-Edit Page” for Composer
• You may still need to edit the HTML file– Some editors use browser-specific features– Some HTML features may be missing entirely– File names may be butchered by FTP
SGML/XML
• Generalized Markup Languages– SGML - Standard Generalized Markup
Language (for paper documents)– XML - eXtensible Markup Language (for Web
documents) (see W3C)
• These allow people to design – DTDs - Document-type definitions
• A Document also needs:– DSSSL - Document Stylesheet Specification
Language
Document Retrieval• Making documents is often easier than finding them!
• Hypertext vs. Cataloging vs. Searching– yahoo vs. altavista
• Lots of applications– Chasing down citations in papers you read– Web search engines– Managing your personal files
• Two basic approaches to searching– Explicit queries (“information retrieval”)– “Watch what I do” (“adaptive filtering”)
Ways of Searching for Text
• Controlled vocabulary– Manual indexing based on named concepts
• Free text– Characterize documents by the words the
contain
• Social filtering– Exchange and interpret personal ratings
“Exact Match” Retrieval
• Find all documents with some characteristic– Indexed as “Presidents -- United States”– Containing the words “Clinton” and “Peso”– Read by my boss
• A set of documents is returned– Each is as likely to be useful as any other– Usually listed in date or alphabetical order
Ranked Retrieval
• Put most useful documents near top of a list– Put possibly useful documents lower in the list
• No need to exclude any documents– Just list those least likely to be useful last
• Two basic techniques– Similarity-based– Probability-based
Similarity-Based Retrieval
• Assume “most useful” = most similar to query
• Lots of clues to meaning– Repeated words are good cues to meaning– Rarely used words make searches more selective
• Easily combined– Compute a “weight” for each term– Add up the weights for query terms in a document
Project Overview
• Goal: Solve a practical problem– One which is fairly complex
• You choose the technology– Make a set of web pages (a web “site”)– Make a database (optional for summer 690)– Do something else that is equally complex
• Multimedia presentation, Java program, …
• Suggest two-person groups
Web Projects• Have significant content! (see “What is a Book”
web site under CLIS Dean’s Award)
• Multiple access points– Taxonomy, search engine, map, etc.
• Be creative (in a useful way)! For example:– Choose a novel application– Engage the user with an interactive approach– Adopt an innovative organization– Implement a creative layout
Database Projects(very ambitious for Summer 690)
• Your focus should be on scalability– What if the IRS decided to use your database?
• The user interface is important– Designed to be used without taking 690 first!
• Include enough content to allow testing– But focus on organization, not on content
• The same creativity issues as web projects