28
DON'T LIKE YOUR GOOGLE SEARCH INTERFACE? MAKE YOUR OWN! C. Daniel Chase — @cdchase The University of Tennessee at Chattanooga #aim7 #heweb14

HighEdWeb 2014: Don't like your Google Search Interface? Make your Own!

Embed Size (px)

Citation preview

DON'T LIKE YOUR GOOGLE SEARCHINTERFACE? MAKE YOUR OWN!

C. Daniel Chase — @cdchase

The University of Tennessee at Chattanooga

#aim7 #heweb14

PUZZLE PARTSComparisonsSearch FormSearch Request ProcessingSearch APIResult ProcessingCustomizing OutputIntegrating into WebsitePage Not Found (404) Handling

COMPARISONSGoogle Custom Search EngineGoogle Site SearchGoogle Search Appliance (GSA)

GOOGLE CUSTOM SEARCH ENGINEFreeCannot Customize ResultsAds on results pages (can be disabled for non-profit)Google Branded

GOOGLE SITE SEARCHFormerly Google Custom Search Business EditionNOT Free!Licensed by Number of SearchesIndexes 13 file formatsOur Search count 934,000+/year = Over $2,000 for licenseLarger license is for off-line engineXML Results Query Reference

GOOGLE SEARCH APPLIANCE (GSA)HardwareLicensed by Document CountIndexes over 220 file formatsCan index sites requiring authentication

UNMODIFIED GSA SEARCH INTERFACE

UNMODIFIED GSA SEARCH ENGINE RESULTSPAGE (SERP)

UT SYSTEM GSA SEARCH INTERFACE

UT SYSTEM GSA SEARCH ENGINE RESULTSPAGE

UTC CUSTOMIZED SEARCH ENGINE RESULTSPAGE

MAKING A SEARCH QUERY

if(isset($_POST['q']) && $_POST['q'] != '') {

$url = "http://google.tennessee.edu/search?"

. "client=utk_frontend&"

. "output=xml_no_dtd&"

. "sort=date:D:L:d1&"

. "entqr=3&"

. "ie=UTF-8&"

. "ud=1&"

. "site=Chattanooga&"

. "start=0&"

. "q=" . urlencode(stripslashes($_POST['q']));

$q = html_entity_decode(strip_tags($_POST['q']));

}

http://google.tennessee.edu/search?client=utk_frontend&output=xml_no_dtd&

sort=date:D:L:d1&entqr=3&ie=UTF-8&ud=1&site=Chattanooga&

start=0&q=university%20web%20services

SEARCH QUERY - RESPONSE

<gsp ver="3.2">

<tm>0.256643</tm>

<q>university web services</q>

<param name="client" value="utk_frontend" original_value="utk_frontend">

<param name="output" value="xml_no_dtd" original_value="xml_no_dtd">

<param name="sort" value="date:D:L:d1" original_value="date:D:L:d1">

<param name="entqr" value="3" original_value="3">

<param name="ie" value="UTF-8" original_value="UTF-8">

<param name="ud" value="1" original_value="1">

<param name="site" value="Chattanooga" original_value="Chattanooga">

<param name="start" value="0" original_value="0">

<param name="q" value="university web services" original_value="university+web+services"

<param name="ulang" value="en" original_value="en">

<param name="ip" value="150.182.252.13" original_value="150.182.252.13">

<param name="access" value="p" original_value="p">

<param name="entqrm" value="0" original_value="0">

<param name="entsp" value="a__urlpattern_policy" original_value="a__urlpattern_policy"

<param name="wc" value="200" original_value="200">

<param name="wc_mc" value="1" original_value="1">

<res sn="1" en="10">

<m>1940</m>

<fi>

<wxt>

<nb>

<nu>/search?q=university+web+services&site=Chattanooga&lr=&ie=UTF-8&output=xml_no_dtd&client=utk_frontend&access=p&sort=date:D:L:d1&start=10&sa=N

</nb>

<r n="1">

<u>http://www.utc.edu/university-web-services/</u>

SEARCH APIBookmark the Reference documentation!

https://support.google.com/gsa/answer/3890846?hl=en&ref_topic=2709671

More specifically, the Search Protocol Reference:http://www.google.com/support/enterprise/static/gsa/docs/admin/72/gsa_doc_set/

xml_reference/

REQUIRED SEARCH PARAMETERSsite

Limits search results to the contents of the specified collection.

client

A string that indicates a valid front end and the policies definedfor it, including KeyMatches, related queries, filters, remove

URLs, and OneBox Modules.

output

Selects the format of the search results.

q

Search query as entered by the user.

site=Chattanooga

client=utk_frontend

output=xml_no_dtd

q=university%20web%20services

PREFERENCE SEARCH PARAMETERSsort

Results can be sorted by relevance, date or metadata.

entqr

This parameter sets the query expansion policy. 3 is Full: Usesboth standard and local synonym files.

ie

Sets the character encoding that is used to interpret the query.

ud

Specifies whether results include ud tags. A ud tag containsinternationalized domain name (IDN) encoding for a result URL.

sort=date:D:L:d1

entqr=3

ie=UTF-8

ud=1

MORE SEARCH PARAMETERSstart

Specifies the index number of the first entry in the result set thatis to be returned. (Use with num.)

start=0

RESULT PROCESSINGWe base our search result handling on the same template

provided with GSA — Customized. But, you can build your own.

Remove SERP <head> content to wrap with your template.Replace references to search in links and form action to pointat your new page.Review settings in top of GSA default XSL for configurableoptions.Remove or fine-tune page top & bottom content.Remove conflicting CSS.

DEFAULT HEADER

DEFAULT FOOTER

CUSTOMIZING OUTPUTStart with built-in options

Replaced the Google logoAdded the header used on the organization's web site.Changed search button textChanged the advanced search anchor text...

Review output for other changesDon't be afraid of (do not customize)

UTC SEARCH HEADER

UTC SEARCH FOOTER

INTEGRATING INTO WEBSITEEvery page should have search form!Customize page content to improve search (SEO)Add standard description & keyword meta tagsAdd custom meta tags

CUSTOM META TAGS

PAGE NOT FOUND (404) HANDLINGDon't redirect directly to search page!Must send 404 Error to search engines crawlersBe nice to people — Do a search for them!Historic page redirectsParse requested URL and use it to search!The Trick: Plain HTML 404 page with JavaScript redirect

404 EXAMPLEOld URL:

http://www.utc.edu/Administration/UniversityRelations/staff.php

QUESTIONS?C. Daniel Chase — @cdchase

[email protected]

The University of Tennessee at Chattanooga

#aim7 #heweb14