Search Engines
Objective
General Search Strategy
Medical Search:
Finding Title
Search for Articles
• Full Text
Clinical Queries
"Research is the process of going up alleys to see if they
are blind."
Devices
PC with VPN
Tablet with VPN, Proxy, Anonymizer…
Change settings to Classic, Desktop…
Chrome, Firefox
Cell phone (quick midnight search)
Dissertation
Study
Everyday
problems
Clinical
Challenges
HOW DO SEARCH ENGINES WORK?
“Spiders" or "robots" ("bots")
Sites with no links to other pages
Sends a copy to server => click links
Index most of the words (Tree)
Rank
Update rank
Search: Scan its index of sites and match
your keywords
Ranking? (AI)
Location: Title, Keywords, Words that are
mentioned towards the beginning of a
document
Frequency: words that are repeated
several times
Word proximity
Ranking?
The number of links that are pointing to
sites
Importance of the pages that link
Quality of links
Traffic
Your search history
Your location
ARE SEARCH ENGINES ALL THE
SAME?
Size
Speed
Content
Search options
Censorship
Index:
Every word
Only part of the document
ARE SEARCH ENGINES ALL THE
SAME?
Stemming?
"cardiac" =>? "heart“ / “heart” =>? “cardiac”
Searching a portion of the web, captured in a
fixed index created at an earlier date (news)
Ranking Algorithm (AI, Fuzzy logic, Machine
learning )
2.5 M Servers?
WHAT WERE METASEARCH
ENGINES?
Do not crawl the web
Search the databases of multiple sets of
individual search engines simultaneously
A quick way of finding out which engines
are retrieving the best results
A fair picture of what's available across the
Web
search quickly and superficially
Meta Cons:
Don't offer the "salad bar" of search
options
Not enough query Google
Catch about 1% of search results
DO NOT USE METASEARCH
ENGINES
You are in a hurry
A quick overview
Are not having any luck pulling up documents in your search
Work best with simple searches
Not recommended:
quick and dirty
not thorough
unpredictable
METASEARCH ENGINES
Dogpile
Mamma
Clusty
Metacrawler
Copernic
PICO
SUBJECT DIRECTORIES
Human editors
Smaller
When you don't have a precise idea
Most effective for finding general information
To see what kind of information is available
Include a keyword search
The line between subject directories and search
engines is blurring
SUBJECT DIRECTORIES
Beaucoup
Looksmart
Open Directory Project
Excite
MSN directory
Netscape
Yahoo! Directory
Google (Rank)
MESH
Gateways
Collections of databases and informational
sites, arranged by subject
Assembled by specialists, usually
librarians
Academically-oriented pages on the Web
Looking for high quality information
Accuracy and content.
Subject-Specific Databases
( "Vortals“)
Devoted to a single subject, created by
professors, researchers, experts
One particular field
When looking for information on a specific
topic
Today, search engines, subject directories
and portals are pointing to these
INVISIBLE WEB (Deep Web)
Search engine spiders cannot index
Pass-protected sites
Behind firewalls
Archived material
Certain databases
Peer-to-peer
Dark web (Tor, onion [-URL], ENCRYPTION)
Password
Point your browser directly at them
Determining Page Authenticity
Generally rely on the GOV and EDU hostnames
http://www.sc.edu/beaufort/library.html
NET, ORG, MIL, and COM=> additional
verification
Reputable Web page
Last date page updated
Mail-to link for questions, comments
Name, address, telephone number, and email
address of page owner
Sources ?
Authority of the author(s) ?
Who is linking to the page? (link:)
link:arakmu.ac.ir -site:.ir
Links to other pages?
Last updated ?
Verified at other, similar sites?
Promotion, advertising, and serious
content ?
Stability of the pages
The page you cite today may be
altered or revised tomorrow, or it
might disappear completely.
keep a backup
Search Strategy
AI
Fuzzy logic
Neural networks (unsupervised learning)
Deep learning
Target population:
Average English-speaking Americans
(Most common passwords)
AI
Computers are not dumb anymore
Don’t expect exact results
Repeat search even if you are sure
Get a second opinion
Second Opinion
Filter bubble:
DuckDuckGo
US Digital Millennium Copyright Act:
yandex.ru
Search Essentials
Search Operators
STOP WORDS
a, about, an, and, are, as, at, be, by, from,
how, in, is, it, of, on, or, that, the, not, this,
to, we, what, when, where, which, with,
etc.
"to be or not to be“ , WHO
Search engines differ, change frequently
Caps
Punctuation marks: @, #, .., : , (Space)
Start Search
Broad => Narrow
PICO
Add words one-by-one
Check terms in results
Modify keywords
Advanced Search
BOOLEAN LOGIC
Text parsing: splitting a sequence of
characters or values (text) into smaller
parts based on some rules
Left-to-right: 2/4*2
Unless:
1.Exceptions: 2^2^.5
2.Precedence: 2+2*4
3.Innermost ()
BOOLEAN OPERATORS
AND, “+”?
Documents that contain every one of the
keywords
Restricts the search
Default in most engines
OR, “|” (Precedence?)
Either or both keywords
Expands the search
Keywords that are similar or synonymous
AND
OR
NOT, “-”
Your first keyword but not the second
dementia –alzheimers
Youtube: -youtube / -inurl:youtube
NESTING
> 2 keywords
More than one type of operator
(stricture OR stenosis) AND Pyloric
STEMMING
NOT
“……….” double quotation marks (" ")
Force all words in exact order.
Instead of “+……”
No synonym
No AI
No omitting (very different word ranks)
• (“Must include:” link)
All in Advanced Search
Truncat*
Stemming
When appropriate, search for words that
are similar to some or all of the terms
rat dietary needs
rat diet needs, food, feed, pellet…?
No need for OR ?
~
Word limit for Google Searches
Server Overload
2,048 characters
32 word limit:
Google search
Google images
10 word limit:
Google groups
Google news
Google Search Operators
Wildcard: * ( one or more words )
Hip * surgery Hip reconstruction surgery
Hip dislocation surgery
Hip fracture surgery
(Questions) coronary bypass was invented by *
vitamin * is good for *
“*” does not indicate a fraction or extension of a word: flower * will not match flowerful Stemming technology
Google Search Operators
define:
cache:cache:arak.mu.ac.ir
related: related:https://www.tripdatabase.com/ tripdatabase
filetype:
site: (Site search, Domain search)
inurl:
intitle:
allintitle:
Google Search Operators
intext: ≈ default (“……”)
allintext: “…..” “…..” “……”
in
link:
..
1..5 kg abdominal tumor
John Smith 1960..1985
Scalpel $10..$20
Other Google Services
Image Search: (View Image Extension)
Language Tools
Scholar
Books
https://academic.microsoft.com/home
PROXIMITY OPERATORS
NEAR: search for terms situated within a
specified distance of each other in any
order
colon NEAR tumor
ADJ (adjacent to): ADJ works as a phrase
but in any order.
endangered ADJ species
“endangered species”
“species endangered”
PROXIMITY OPERATORS
AROUND (X)
“breast cancer AROUND(3) aspirin”
CREATING A SEARCH STRATEGY
STEP 1: STATE WHAT YOU WANT TO FIND
In one or two sentences, state what you
want to find
What are the gastrointestinal
side-effects of Brufen?
STEP 2: IDENTIFY KEYWORDS
Underline the main concepts in the
statement
What are the gastrointestinal
side-effects
of Brufen?
STEP 3: SELECT SYNONYMS
AND VARIANT WORD FORMS
Gastrointestinal: gastric, stomach, bowel,
intestine
Side-effects: “side effect”, “side-effect”
Brufen: Ibuprofen, Fenbid
Stemming?
STEP 4:COMBINE SYNONYMS, KEYWORDS,
AND VARIANT WORD FORMS
synonyms with Boolean OR (parentheses)
(Brufen OR Ibuprofen OR Fenbid)
asterisk symbol (*) to combine variant
word forms ?????
(Intestin* OR gastrointestin*)
Combine keywords with Boolean AND
(Brufen OR Ibuprofen OR Fenbid)
(Intestine OR gastrointestinal OR stomach)
side effect
Quick Tips
Truncation - use OR searches for variants
librar* = library OR libraries OR librarian
Be specific
Use nouns and objects as keywords
Put most important terms first
“…..” most important terms?
At least three keywords
Combine keywords into phrases
“acute abdominal pain”
Avoid common words
Anticipate the answers:
Imagine what the ideal page you would
like to access would look like. Think about
the words its title and in the first couple of
sentences.
Type keywords and phrases in lower case
Always enclose OR statements in
parentheses
Use CAPS when typing Boolean
operators
WHAT TO DO IF ...
YOUR SEARCH RETURNS A
"ZILLION" DOCUMENTS
Too few terms
Common words
Think of some synonyms (read pages)
Try adding more specific terms
TOO FEW DOCUMENTS
Searching in the wrong place
Your search is too narrow (PICO)
You didn't configure your search correctly
The information isn't on the visible Web
Try omitting some of your search terms
Another engine or specialty resource
Ask for help
Remember, you are smarter
than a computer. Use your
intelligence. Search engines
are fast, but dumb.
Pirate Ebooks
Old-school
WebSites with Security Vulnerabilities
-inurl:(htm|html|php) intitle:”index of” ”last
modified” ”parent directory” description size
(pdf|doc) “banned books″
MOBOTIX Webcams: control/userimage.html
P2P: (malware)
Random Websites
Now
Social Networks
Iran IP:
Free
Paid
Russian:
ebook3000, ebookee, avaxhome (!!!)
gen.lib.rus.ec