34
Introduction to Google API… By Pratheepan Raveendranathan

What is Google API? With the Google Web APIs service, software

  • Upload
    vothuan

  • View
    229

  • Download
    1

Embed Size (px)

Citation preview

Page 1: What is Google API? With the Google Web APIs service, software

Introduction to Google API…

By Pratheepan Raveendranathan

Page 2: What is Google API? With the Google Web APIs service, software

The Google Web APIs service is a beta web program that enables developers to easily find and manipulate information on the web.

Google Web APIs are for developers and researchers interested in using Google as a resource in their applications.

The Google Web APIs service allows software developers to query more than 3 billion web documents directly from their own computer programs.

Google uses the SOAP and WSDL standards to act as an interface between the user’s program and Google API.

Programming environments such as Java, Perl, Visual Studio .NET are compatible with Google API.

Definitions from http:// www.google.com/apis/

What is Google API?

Page 3: What is Google API? With the Google Web APIs service, software

What can you do with the API Developers can issue search requests to

Google's index of more than 3 billion web pages. and receive results as

structured data, Estimated number of results, URL’s, Snippets, Query Time

etc. access information in the Google cache, and check the spelling of words.

Page 4: What is Google API? With the Google Web APIs service, software

To start using the API You need to,

Download API Package from http://www.google.com/apis/

Create an account and get your license key Install kit in your UMD account And also need Soap::Lite

However, it is on all the csdev machines, so you don’t need to get it. IT is not on UB or Bulldog.

Page 5: What is Google API? With the Google Web APIs service, software

Contents of this package: googleapi.jar - Java library for accessing the Google Web APIs

service.

GoogleAPIDemo.java - Example program that uses googleapi.jar. dotnet/

Example .NET - programs that uses Google Web APIs.

APIs_Reference.html - Reference doc for the API. Describes semantics of all calls and fields.

Javadoc - Documentation for the example Java libraries.

Licenses - Licenses for Java code that is redistributed in this package.

GoogleSearch.wsdl -WSDL description for Google SOAP API.

soap-samples/

Page 6: What is Google API? With the Google Web APIs service, software

WSDLWeb Services Description Language

The standard format for describing a web service.

Expressed in XML, a WSDL definition describes how to access a web service and what operations it will perform.

This is the most important file (only) to use the API with Perl.

Page 7: What is Google API? With the Google Web APIs service, software

SOAP –Simple Object Access Protocol

SOAP stands for Simple Object Access Protocol SOAP is a communication protocol SOAP is for communication between applications SOAP is a format for sending messages SOAP is designed to communicate via Internet SOAP is platform independent SOAP is language independent SOAP is based on XML SOAP will be developed as a W3C standard

Page 8: What is Google API? With the Google Web APIs service, software

Google API for Perl

SOAP:Lite SOAP:Lite for Perl is a collection of Perl

modules which provides a simple and lightweight interface to the SOAP both on client and server side.

Page 9: What is Google API? With the Google Web APIs service, software

So How do I Query Google?#!/usr/local/bin/perl –wuse SOAP::Lite;

# Configuration$key = "Your Key Goes Here";

# Initialize with local SOAP::Lite file$service = SOAP::Lite -> service('file:GoogleSearch.wsdl');

$query= “duluth”;

Page 10: What is Google API? With the Google Web APIs service, software

Search Contd…$result = $service -> doGoogleSearch( $key, # key $query, # search query 0, # start results 10, # max results "false", # filter: boolean "", # restrict (string) "false", # safeSearch: boolean "", # lr "", # ie "" # oe );

Page 11: What is Google API? With the Google Web APIs service, software

Name Description

keyProvided by Google, this is required for you to access the Google service. Google uses the key for authentication and logging.

q Query Phrase.

start Zero-based index of the first desired result.

maxResults

Number of results desired per query. The maximum value per query is 10.Note: If you do a query that doesn't have many matches, the actual number of results you get may be smaller than what you request.

filter Activates or deactivates automatic results filtering, which hides very similar results and results that all come from the same Web host.

Page 12: What is Google API? With the Google Web APIs service, software

restrictRestricts the search to a subset of the Google Web index, such as a country like "Ukraine" or a topic like "Linux."

safeSearchA Boolean value which enables filtering of adult content in the search results.

lr Language Restrict - Restricts the search to documents within one or more languages.

ie

Input Encoding - this parameter has been deprecated and is ignored. All requests to the APIs should be made with UTF-8 encoding.

oeOutput Encoding - this parameter has been deprecated and is ignored. All requests to the APIs should be made with UTF-8 encoding.

Page 13: What is Google API? With the Google Web APIs service, software

Now to Retrieve the Search Results

if(defined($result->{resultElements})) { print join "\n", "Found:", $result->{resultElements}->[0]->{title},

$result->{resultElements}->[0]->{URL}, $result->{resultElements}->[0]->{snippet} . "\n"

}

print "\n The search took ";print $result->{searchTime};print "\n\n";print "The estimated Number of results for your query is: ";print $result->{estimatedTotalResultsCount};print "\n\n";

What you need for your program

Page 14: What is Google API? With the Google Web APIs service, software

Search.pl OutputFound:University of Minnesota <b>Duluth</b> Welcomes You

http://www.d.umn.edu/

The University of Minnesota <b>Duluth</b> Homepage: an overview of academic prog

rams, campus<br> life, resources, news and events, with extensive links to other web sites <b>...</b>

The search took 0.159791

The estimated Number of results for your query is: 881000

Page 15: What is Google API? With the Google Web APIs service, software

Or, to get all elements: foreach $temp (@{$result->{resultElements}}) {

print $temp->{snippet}; }

foreach $temp (@{$result->{resultElements}}) {print $temp->{URL};

}

foreach $temp (@{$result->{resultElements}}) { $title_array[$count++]=$temp->{title}; }

Page 16: What is Google API? With the Google Web APIs service, software

How to get a spelling suggestion?#!/usr/local/bin/perl -w

use SOAP::Lite;

# Configuration$key = "Your Key Goes Here";

# Initialize with local SOAP::Lite file$service = SOAP::Lite -> service('file:GoogleSearch.wsdl');

$correction = $service->doSpellingSuggestion($key,$searchString);

Page 17: What is Google API? With the Google Web APIs service, software

How do I get the results? Easy,

The variable Correction will contain the spelling suggestion, if Google has one, or it would be empty if there is no suggestion

So Retrieving the result would be as easy as:

print "\n The suggested spelling for $searchString is $correction \n\n";

Page 18: What is Google API? With the Google Web APIs service, software

Spelling output

Enter a worddulut

The suggested spelling for “Duluth” is: duluth

Page 19: What is Google API? With the Google Web APIs service, software

How do I get a cached web page? Google has this feature that given a URL, it

will try to retrieve the web page from its “cache”. So the actual contents of the page might be

somewhat old, relative to when the web crawlers or Google did an update on the site

Example,

Page 20: What is Google API? With the Google Web APIs service, software

Example Contd…

#!/usr/local/bin/perl –wuse SOAP::Lite;

# Configuration$key = "Your Key Goes Here";

# Initialize with local SOAP::Lite file$service = SOAP::Lite -> service('file:GoogleSearch.wsdl');

$url="http://www.d.umn.edu";

$cachedPage=$service->doGetCachedPage($key,$url);

Page 21: What is Google API? With the Google Web APIs service, software

How do I retrieve the results? This is going to be the same as the spelling

suggestion, So if the web page does exist you will have the

whole web page HTML in the “cachedWebpage” variable.

Otherwise, you would get a message from Google which says

“ This web page has not been updated…blah…blah…blah “

Page 22: What is Google API? With the Google Web APIs service, software

Search with other options:Google has four topic restricts:

Topic<restrict> value

US. Government unclesamLinux linuxMacintosh mac

FreeBSD bsd

Page 23: What is Google API? With the Google Web APIs service, software

Search with Restrictions:$result = $service -> doGoogleSearch(

$key, # key $query, # search query 0, # start results 10, # max results "false", # filter: boolean "linux", # restrict (string) "false", # safeSearch: boolean "", # lr "", # ie "" # oe );

Page 24: What is Google API? With the Google Web APIs service, software

Search with Language Restrictions

$result = $service -> doGoogleSearch(

$key, # key $query, # search query 0, # start results 10, # max results "false", # filter: boolean "", # restrict (string) "false", # safeSearch: boolean "lang_de", # lr "", # ie "" # oe );

print "\n The search took ";print $result->{searchTime};print "\n\n";print "The estimated Number of results for your query is: ";print $result->{estimatedTotalResultsCount}; print "\n\n";

if(defined($result->{resultElements})) { print join "\n", "Found:", $result->{resultElements}->[0]->{title},$result->{resultElements}->[0]->{URL}, $result->{resultElements}->[0]->{snippet} . "\n" }

lang_de = Gernman

Page 25: What is Google API? With the Google Web APIs service, software

Search with Language Restrictions Contd…

Please Enter Search Itemder sturm

The search took 0.309039

The estimated Number of results for your query is: 206000

Found:SK <b>STURM</b> GRAZ - Willkommen beim Sk <b>Sturm</b>http://www.sksturm.at/Eintreten. Puntigamer das bierige Bier, Steiermark.com, Puma, Tipp3,<br>

Autohaus Jakob Prügger, Graz - Hausmannstätten. © 2003 SkSturm <b>...</b>

Page 26: What is Google API? With the Google Web APIs service, software

Tips on Querying Google

Default SearchBy default, Google only returns pages that include all of the terms in the query string.  

Stop WordsGoogle ignores common words and characters such as "where" and "how," as well as certain single digits and single letters. Common words that are ignored are known as stop words. However, you can prevent Google from ignoring stop words by enclosing them in quotes, such as in the phrase "to be or not to be".

Special CharactersBy default, all non-alphanumeric characters that are included in a search query are treated as word separators.  The only exceptions are the following: double quote mark ("), plus sign (+), minus sign or hyphen (-), and ampersand (&).  The ampersand character (&) is treated as another character in the query term in which it is included, while the remaining exception characters correspond to search features listed in the section below.

Special Query TermsGoogle supports the use of several special query terms that allow the user or search administrator to access additional capabilities of the Google search engine.

(The same Explanations can be found in the API Reference Section in your Google API download)

Page 27: What is Google API? With the Google Web APIs service, software

(The following Special Query table can be found in the API Reference Section in your Google API download)

Page 28: What is Google API? With the Google Web APIs service, software

Special Query Capability Example Query Description

Include Query Term Star Wars Episode +I

If a common word is essential to getting the results you want, you can include it by putting a "+" sign in front of it. 

Exclude Query Term bass -music

You can exclude a word from your search by putting a minus sign ("-") immediately in front of the term you want to exclude from the search results.

Phrase Search "yellow pages"

Search for complete phrases by enclosing them in quotation marks or connecting them with hyphens. Words marked in this way will appear together in all results exactly as entered. Note: You may need to use a "+" to force inclusion of common words in a phrase.

Page 29: What is Google API? With the Google Web APIs service, software

Boolean OR Search vacation london OR parisGoogle search supports the Boolean "OR" operator. To retrieve pages that include either word A or word B, use an uppercase OR between terms.

Site Restricted Search

admission site:www.stanford.edu

If you know the specific web site you want to search but aren't sure where the information is located within that site, you can use Google to search only within a specific web site. Do this by entering your query followed by the string "site:" followed by the host name.

Date Restricted Search

Star Wars daterange:2452122-2452234

If you want to limit your results to documents that were published within a specific date range, then you can use the "daterange: " query term to accomplish this. The "daterange:" query term must be in the following format: daterange:<start_date>-<end date>

Page 30: What is Google API? With the Google Web APIs service, software

Title Search (term)

intitle:Google search

If you prepend "intitle:" to a query term, Google searchrestricts the results to documents containing that word in the title. Note there can be no space betweenthe "intitle:" and the following word.

Title Search (all) allintitle: Google search

Starting a query with the term "allintitle:" restricts the results to those with all of the query words in the title.

URL Search (term)

inurl:Google search

If you prepend "inurl:" to a query term, Google search restricts the results to documents containing that word in the result URL. Note there can be no space between the "inurl:" and the following word.

 To find multiple words in a result URL, use the "inurl:" operator for each word. Note: Putting "inurl:" in front of every word in your query is equivalent to putting "allinurl:" at the front of your query

Page 31: What is Google API? With the Google Web APIs service, software

URL Search (all)

allinurl: Google search

Starting a query with the term "allinurl:" restricts the results to those with all of the query words in the result URL.

Text Only Search (all)

allintext: Google search

Starting a query with the term "allintext:" restricts the results to those with all of the query words in only the body text, ignoring link, URL, and title matches.

Links Only Search (all)

allinlinks: Google search

Starting a query with the term "allinlinks:" restricts the results to those with all of the query words in the URL links on the page.

File Type Filtering

Google filetype:doc OR filetype:pdf

The query prefix "filetype:" filters the results returned to include only documents with the extension specified immediately after. Note there can be no space between "filetype:" and the specified extension.

Page 32: What is Google API? With the Google Web APIs service, software

File Type Exclusion

Google -filetype:doc -filetype:pdf

The query prefix "-filetype:" filters the results to exclude documents with the extension specified immediately after.  Note there can be no space between "-filetype:" and the specified extension.

Web Document Info

info:www.google.com  The query prefix "info:" returns a single result for the specified URL if it exists in the index.

Back Links link:www.google.com The query prefix "link:" lists web pages that have links to the

specified web page. Note there can be no space between "link:" and the web page URL.

Related Links related:www.google.com 

The query prefix "related:" lists web pages that are similar to the specified web page. Note there can be no space between "link:" and the web page URL.

Cached Results Page

cache:www.google.com web

The query prefix "cache:" returns the cached HTML version of the specified web document that the Google search crawled. Note there can be no space between "cache:" and the web page URL.

.

Page 33: What is Google API? With the Google Web APIs service, software

Other Interesting Issues Search for say “yahoo”, and look at the

estimated number of results. Wait for like a minute or so. Search again for “yahoo” and look at the

estimated number of results. The result, 5 out of 10 times, will be different.

Page 34: What is Google API? With the Google Web APIs service, software

Conclusion…The API can be used as means of retrieving “information” and “Text” from the web.

Some interesting examples:

http://www.googleduel.com/original.php

http://douweosinga.com/projects/googlehacks

http://www.researchbuzz.org/archives/001418.shtml

http://cgi.sfu.ca/~gpeters/cgi-bin/pear/gender.php