Upload
spike
View
26
Download
0
Tags:
Embed Size (px)
DESCRIPTION
jaslin group SAN FRANCISCO [email protected]. California Coalition of Nurse Practitioners. Data Mining on the Internet “finding it in Cyberspace”. James A. Sanders, CAE jaslin group SAN FRANCISCO. WHAT WE WANT TO LEARN TODAY. What is available on the Internet - PowerPoint PPT Presentation
Citation preview
jaslin groupSAN [email protected]
California California CoalitionCoalitionof Nurse of Nurse PractitionersPractitionersData Mining on the Internet
“finding it in Cyberspace”
James A. Sanders, CAE
jaslin group SAN FRANCISCO
WHAT WE WANT TO WHAT WE WANT TO LEARN TODAYLEARN TODAY
What is available on the Internet
What are the best tools to obtain valuable products and services
How to configure a search on a browser
What are the best search sites
Why should I care?
ARE ARE COMPUTERCOMPUTER
SSMALE?MALE?
oror
FEMALE?FEMALE?
Five reasons to Five reasons to believe believe
computers are computers are female...female...
1. No one but the Creator understands their internal logic.
2. The native language they use to communicate with other computers is incomprehensible to everyone else.
3. The message "Bad command or file name" is about as informative as, "If you don't know why I'm mad at you, then I'm certainly not going to tell you."
4. Even your smallest mistakes are stored in long-term memory for later retrieval.
5. As soon as you make a commitment to one, you find yourself spending half your paycheck on accessories for it.
5 REASON TO 5 REASON TO BELIEVE BELIEVE
COMPUTERS ARE COMPUTERS ARE MALE...MALE...
1. They have a lot of data, but are still clueless.
2. They are supposed to help you solve problems, but half the time they ARE the problem.
3. As soon as you commit to one you realize that, if you had waited a little longer, you could have obtained a better model.
4. In order to get their attention, you have to turn them on.
5. Big power surges knock them out for the rest of the night.
WHAT QUESTIONS DO YOU WANT ANSWERED TODAY?
WHAT YOU NEED TO GET ON WHAT YOU NEED TO GET ON THE INTERNETTHE INTERNET
COMPUTERMODEM & PHONE LINESOFTWARE ISP ACCOUNTDOMAIN NAMEE-MAIL NAMEGOOD REFERENCE
BOOKSTIME!
WHAT'S IT GOOD WHAT'S IT GOOD FOR…?FOR…? E-mail
Basic research Limited marketing Event registration File downloads News and information Publishing Resources & reference library Listservers Newsgroups
WEB BROWSERSWEB BROWSERSThe Software you need for the “Web”
NETSCAPE NAVIGATOR or MS INTERNET EXPLORER
IP ADDRESSIP ADDRESS
205.158.47.110Simple…huh?
--Uncle InterNICInternet Network Information
Center
ANATOMY OF A URLANATOMY OF A URL(Uniform Resource (Uniform Resource
Locator)Locator)
http://www.jaslin.com/security/index.html
site type
|
|
sub-domain
unique domain name
|
|
high-level
domain
disk dir
on ISP
|
|
ISP file
browser sees
HIGH-LEVEL DOMAINSHIGH-LEVEL DOMAINS.com COMMERCIAL.edu EDUCATION.gov GOVERNMENT.net WEB HOST.mil MILITARY.org OTHER / NON-
PROFIT.uk United
Kingdom.fr France
FIVE MAJOR GROUPS OF THE INTERNET
Newsgroups and discussion lists files available by FTP bulletin boards and other
services accessible using the telnet command
services organized using gopher software;
material using WWWeb software
MEDIA TYPES ON THE MEDIA TYPES ON THE INTERNETINTERNET
TEXT IMAGESAUDIOVIDEOPERSONAL
COMMENTSPROGRAMS
Surfing is
browsing
without tools.
SEARCHING THE SEARCHING THE INTERNETINTERNET
Releases the true value of the Internet!
Basic research for information Use SEARCH ENGINES to find
info Spiders index the web
continuously Boolean logic - OR, AND & NOT Use multiple search engines Advanced vs. standard search
WHAT DO YOU WANT TO WHAT DO YOU WANT TO FIND?FIND?
A keyword A person A URL A phrase A geographic location A concept A title
TYPES OF SEARCH TYPES OF SEARCH ENGINESENGINES
Keyword Alta VistaHotBotWebcrawler
Concept / hierarchicalYahooInfoseek
Meta SearchSearch.com
COMPONENTS OF COMPONENTS OF SEARCHESSEARCHES
Standard BooleanAdvanced BooleanProximity SearchingRequired TermsProhibited TermsWildcardsCase Sensitivity
SEARCH ENGINES VARY ACCORDING TO:
Size of the index Frequency of updating
the index Search options Speed of returning a
result set Result set presentation Relevancy of the items
included in a result set Overall ease of use.
Search “Rules of Search “Rules of Thumb”Thumb” Enter precise search terms or phrases to
limit the search. Use the required / prohibited term operator; Enter singular terms. Most search engines
will find the substring to generalize a subject;
use wildcards where allowed; Do not use common, generic search terms ...
(book vs. "book binding) Enter multiple spellings where appropriate...
(Khaddafi Quadafy Kaddafi Qadaffi... ) Use Booleans and especially proximity &
adjacency operators to increase the relevancy;
Be persistent and creative. Its a big web out there!
Searching for information on the
Internet is more an ART than a SCIENCE. You should be prepared to spend time looking for
something, and still come up empty.
Some Popular Web Search Some Popular Web Search EnginesEngines
Alta Vista Yahoo Lycos WebCrawler InfoSeek MetaSearch Dogpile Northern
Lights
AltaVista is the premier search engine on the web.
It has the largest, most inclusive indices allows searching of both the web and
many Usenet Newsgroups It provides both simple and advanced
searches
Search terms entered in lower case letters are non-case sensitive.
Capitalized terms (or accented letters) makes the term case sensitive.
HotDog finds only the terms spelled exactly with that capitalization
hotdog finds all occurrences
Case Sensitivity
AaBbCc
Required / Excluded Required / Excluded WordsWords
Require a word - pre-pend it with a + symbol:
+HotDog. Exclude a word - pre-pend it with a -
symbol: +"F. Scott Fitzgerald" -Gatsby.
+Lincoln -automobile
Wildcard CharactersWildcard Characters The asterisk (*) is AltaVista's wildcard
character.butt* will get:
buttbuttsbutterbutton
The asterisk cannot be used at the beginning or in the middle of words. It will substitute for up to 5
additional lower case letters.
Confidence RankingsConfidence RankingsAltaVista will assign a confidence ranking
to the hits it returns based on the following:
The query terms are found in the first few words of the document (especially the title of web pages).
The query terms are found in close proximity to one another in the document.
The document contains more of the search terms than other documents.
SEARCH SYNTAX EXAMPLES horses AND carriages
"Abraham Lincoln" AND "civil war" ("Abraham Lincoln") AND NOT ("civil war") (Note: Do NOT use x NOT y, it must be x
AND NOT y.) "Thomas Middleton" OR "Beaumont and
Fletcher" (dogs OR cats) AND ("pet care") "William Shakespeare" NEAR internet (illegal AND immigrant) AND NOT (Mexico) alien OR ufo alien AND NOT ufo football AND (rugby OR soccer)
PROXIMITY & ADJACENCY PROXIMITY & ADJACENCY EXAMPLESEXAMPLES use NEAR/n, where n is the number of
words apart the two search terms should be
Shakespeare NEAR/5 Internet. If a range is not entered, NEAR will
return hits on documents where the words are next to each other, in either order.
For controlling the specific order two words must appear next to each other, you may use the ADJ operator:
reverse ADJ osmosis.
Yahoo is not a search engine, but strictly a hierarchically arranged subject index.
It has developed over a long time, with lots of editorial care, so the quality is very high.
Browsing Yahoo is the best way to surf for good sites when you don't know (or perhaps care) where exactly you are going.
It is also the best way to find good 'starter' sites, from which you can branch out to more specialized ones.
YAHOO RETURNS 3-YAHOO RETURNS 3-TYPES INFOTYPES INFO
Yahoo categories that match the search term
Actual matching end-sites
The Yahoo categories from which the various pages are indexed
IN YAHOO USER CAN IN YAHOO USER CAN CONTROLCONTROL
Though you cannot create very sophisticated searches as with the search engines, you can control:
where to search - Usenet or Email whether to OR or AND the search
terms search on substrings (find whole
words from partial stringsnumber of matches per page
METASEARCH ENGINESMETASEARCH ENGINES
Search engine of search engines
search.comDogpile
The Internet Sleuth
BOTS AND INTELLIGENT AGENTS
Intelligent agents are software entities that assist people and act on their behalf
Strictly speaking, all bots are "autonomous" able to react to their environments and make decisions without prompting
Bona fide bots are programs with personality
When looking for people, you will usually be looking for one of the major information pieces:
Address Phone number E-mail address Personal information
TODAY WE TODAY WE COVERED...COVERED...
What is available on the Internet
What are the best tools to obtain valuable products and services
How to configure a search on a browser
What are the best search sites
Why should I care?
YOU ARE YOU ARE NOW….NOW….
MASTER OF CYBERSPACE!
Data Mining on
the Internet “finding it
in
Cyberspace”
"I think there's "I think there's a world market a world market
for about 5 for about 5 computers.”computers.”
Thomas J. Watson, Thomas J. Watson, Chairman of the Chairman of the
Board, IBM (around Board, IBM (around
1948)1948)
““There is no There is no reason anyone reason anyone would want a would want a
computer in their computer in their home.”home.”
Ken Olson, president, Ken Olson, president, chairman and founder of chairman and founder of Digital Equipment Corp., Digital Equipment Corp.,
19771977
jaslin groupSAN [email protected]