AutoWeb_Guide_2.0a

Embed Size (px)

Citation preview

  • 8/12/2019 AutoWeb_Guide_2.0a

    1/45

    A Guide to AutoWeb

    Release 2.0

    Memex Technology Limited2 Redwood Court

    Peel ParkEast Kilbride G74 5PF

    Scotland UKTel: +44 (0) 1355 233 804Fax: +44 (0) 1355 239 676

    Web: http://www.memex.com

  • 8/12/2019 AutoWeb_Guide_2.0a

    2/45

    Copyright 2007 Memex Technology Limited. All rights reserved.

    This manual and the software described herein are the copyright of Memex Technology Limited and may not becopied or disclosed to a third party without the prior written permission of Memex. Whilst all possible care is taken in

    the preparation of this manual, Memex assumes no responsibility or liability for any errors or inaccuracies that mayappear in this document. Memex reserves the right to make changes without notice both to this manual and to thesoftware and hardware it describes.

    The software described in this document is furnished under licence and may only be used in accordance with theterms of such licence.

    The people, places, organisations, telephone numbers, vehicle identification numbers and other details referred to inthe sample record data in this publication are entirely fictitious. These details have been created for demonstrationpurposes only and do not refer to any actual organisation, telephone number, vehicle, etc., or to any actual person,living or dead.

    The text of this document may include references to previous releases of the product for example, in screenshotsand procedural examples. Regardless of any versions that may be mentioned, this manual describes the current

    functionality provided by the release of the software identified on the title page.

    Trademarks

    Memex, Textract and Total Content Access are registered trademarks of Memex Technology Limited. Microsoft,PowerPoint and Windows are registered trademarks of Microsoft Corporation. Other product, brand and companynames mentioned herein are trademarks or registered trademarks of their respective owners and should be treatedas such.

    2.0a-5-IJ-AC-20070912-1.6

  • 8/12/2019 AutoWeb_Guide_2.0a

    3/45

    Contents

    Scope............................................................................................................5

    Related documents...............................................................................................5

    Product names......................................................................................................5

    Introduction.................................................................................................7AutoWeb toolbar...................................................................................................................... 7AutoWeb server ....................................................................................................................... 7

    Chapter 1 Installing the AutoWeb server....................................................8

    Server components...............................................................................................8

    Installation prerequisites ...................................................................................10SFU requirements .................................................................................................................. 10

    Installing the server components ......................................................................11Installing using the auto-installer ............................................................................................ 11Using the auto-installer on Windows ....................................................................................... 11

    Using the auto-installer on Solaris or Linux .............................................................................. 12Installing using the tar file ...................................................................................................... 12

    Creating extra databases........................................................................................................ 15

    Setting up the AutoWeb configuration file.........................................................15The default spider.cfg file ....................................................................................................... 17HTTrack options and robots.txt............................................................................................... 17

    Upgrading to AutoWeb 2.0 ................................................................................. 18Unpack the installation package.............................................................................................. 18Updating the configuration database....................................................................................... 18Run the upgrade scripts ......................................................................................................... 19

    Chapter 2 Installing the AutoWeb client ...................................................20

    Installing the toolbar..........................................................................................20Configuring the toolbar........................................................................................................... 20Configuring the toolbar from the Windows registry................................................................... 21How the toolbar works ........................................................................................................... 22

    Memex Analyst forms .........................................................................................23

    Installation tasks................................................................................................24Memex Intelligence Engine..................................................................................................... 24Memex Patriarch.................................................................................................................... 24

    AutoWeb databases for Memex Patriarch................................................................................. 25

    Configuration tasks ............................................................................................27Modifying the spider.cfg file.................................................................................................... 27

    Linking to the WebConfig database ......................................................................................... 27

  • 8/12/2019 AutoWeb_Guide_2.0a

    4/45

    Memex Technology Ltd A Guide to AutoWeb

    Linking to the WebArchive database........................................................................................ 28

    Setting up picklists .............................................................................................29

    Adding additional web archives .........................................................................29

    Chapter 4 Using AutoWeb..........................................................................31

    Selecting a Memex database..............................................................................31

    Specifying keywords...........................................................................................31

    Indexing Web page text .....................................................................................31

    Indexing a Web page..........................................................................................31

    Viewing indexed pages.......................................................................................32

    Monitoring Web sites..........................................................................................33

    Specifying the sites you want to monitor........................................................... 33Specifying sites - Memex Patriarch .......................................................................................... 33Specifying sites Memex Analyst............................................................................................ 34Fields on the configuration form.............................................................................................. 35

    How Web site monitoring works ........................................................................37

    Stopping getsite.pl .............................................................................................37Extracting the Web page text.................................................................................................. 38

    Appendix A Known limitations...................................................................39

    Appendix B Troubleshooting......................................................................40

    Appendix C HTTrack options......................................................................41

    Appendix D Upgrading to AutoWeb 1.3.....................................................43

    Backing up your previous AutoWeb setup .........................................................43

    Installing AutoWeb 1.3.......................................................................................43

    Converting your AutoWeb data ..........................................................................44Setting up the conversion script .............................................................................................. 44Running the conversion script................................................................................................. 44

    4

  • 8/12/2019 AutoWeb_Guide_2.0a

    5/45

    Memex Technology Ltd A Guide to AutoWeb

    Scope

    Thisguideprovidesdetailedinstallationanduserinstructionsforrelease2.0ofAutoWeb.

    Thedocumentcontains:

    AnoverviewoftheAutoWebapplication

    Installationandconfigurationinstructionsfortheclientandservercomponents

    Detaileduserinstructions

    Informationonknownlimitations

    Instructionsonhowtoupgradefromapreviousrelease

    Ifyouhaveanycommentsaboutthisguide,pleasecontactMemexCustomerSupport:

    [email protected]

    Related documentsForfurtherinformationaboutthisreleaseofAutoWeb,pleasereadtheAutoWebReleaseNotes.

    Product namesThismanualcontainsreferencestootherMemexproducts.Thenamesofsomeofthese

    productswerechangedrecentlyfornewreleasesofthesoftware.Thenamechangesare

    showninthefollowingtable.

    Current name Previous name Notes

    MemexPatriarch IntelligenceManager MemexPatriarchisadesktopclient

    application,whereas

    Intelligence

    Manager

    comprisesadesktopapplicationplus

    variousservercomponents.

    MemexAnalyst IntelligenceAnalyst

    MemexSeriesVI TheIntelligence

    Managerbundle

    MemexSeriesVIandtheIntelligence

    Managerbundlearesetsofcompatible

    products.

    MemexSeriesVI

    Server

    TheIntelligence

    Managerserver

    componentsplusthe

    MemexIntelligence

    Engine

    TheMemexSeriesVIServercomprisesthe

    MemexIntelligenceEngineplusvarious

    servercomponentsthatsupporttheclient

    applications.

    5

  • 8/12/2019 AutoWeb_Guide_2.0a

    6/45

    Memex Technology Ltd A Guide to AutoWeb

    Thismanualusesthenameofthecurrentreleaseofthesoftwareunlessspecificallyreferring

    toanolderrelease.Unlessstatedotherwise,detailsreferringtoaproductbyitscurrentname

    alsoapplytoreleasesoftheproductsthatusedthepreviousname.

    6

  • 8/12/2019 AutoWeb_Guide_2.0a

    7/45

    Introduction

    AutoWebprovidesaneasywaytoextracttextfromaWebsiteandtransferittoaMemex

    database.

    AutoWebhastwomaincomponents:

    AtoolbarthatintegratesintoInternetExplorerandallowsyoutoindexindividualpages

    directlyfromthebrowser.

    Aserversideprocessthatyoucaneitherrunmanuallyoraspartofacronjob.

    AutoWeb toolbar

    WhenyouusetheAutoWebtoolbar,youcanchoosetoindexallthetextfromaWebpageor

    justindexselectedtext.ThetoolbaralsoallowsyoutospecifytheMemexdatabasewhereyou

    wanttoindextheWebpage,andtoenterkeywordsassociatedwiththepage.

    AutoWeb server

    Theserversideprocessreadsthecontentsofaconfigurationdatabasecontaininginformation

    onwhich

    pages

    should

    be

    indexed.

    The

    process

    then

    mirrors

    (that

    is,

    stores

    alocal

    copy

    of)

    eachWebpageandcreatesarecordinaMemexdatabase.Themirroredfilesareusedfor

    displayingtheWebpageinabrowser.ThedatabaseisusedforretrievingaWebpagebased

    onasearchqueryenteredinMemexPatriarchorMemexAnalyst.

    Wheneverapageisindexed,eitherfromthetoolbarorfromtheserverprocess,AutoWeb

    makesacopyofthepage.Thisenablesyoutoaccesshistoricalcopiesofthepagesyouhave

    indexed.

    Note AutoWeb is designed to be integrated with Memex Patriarch and Memex Analyst orIntelligence Manager and Intelligence Analyst if you are using older versions of these

    applications. You can use either application to view the configuration and indexrecords and access the indexed Web pages.

    7

  • 8/12/2019 AutoWeb_Guide_2.0a

    8/45

    Chapter 1Installing the AutoWeb server

    Server componentsThistableliststhecomponentsthattheAutoWebserverinstallationprocessinstalls.

    Name Details

    bin/HTTrack HTTrackisautilitythatisusedtomirrorWeb

    pages.

    bin/libhttrack.so.1 SharedlibraryforHTTrack(forSolaris)

    bin/lynx Lynxisatextbasedbrowserutilitythatisusedto

    extractthetextfromWebpages.

    bin/lynx.cfg ConfigurationfilefortheLynxutility.

    bin/getsite.pl Thisperlscriptisrunasacronjob.Itlooksatthe

    contentsoftheconf i g. dbdatabaseandindexes

    anysitesthathavebeensetup.

    bin/addtomemex.pl ThisperlscriptiscalledbyanyHTTrackprocess

    thatislaunchedfromgetsite.pl.HTTrackcallsthis

    scripteverytimeitdownloadsafile.Thescriptthen

    decideswhattodowiththefileandaddsarecordto

    adatabaseifnecessary.

    bin/addpagefile.pl ThisperlscriptiscalledbyanyHTTrackprocess

    thatislaunchedfromthefileI ndexPage. pl.

    HTTrackcallsthisscripteverytimeitdownloadsa

    file.The

    script

    then

    decides

    what

    to

    do

    with

    the

    file.

    cgibin/Bar.pl Thisisacgiscriptforbackwardscompatibilitywith

    theoriginalMemextoolbar(Version1.0a).This

    controlswhatappearsonthatversionofthetoolbar

    andtheactionsthatthetoolbarbuttonsperform.

    cgibin/Databases.pl ThisisacgiscriptthatisusedbythenewMemex

    toolbar(Version1.0b)todeterminethelistof

    databases.

    cgibin/IndexPage.pl Thisisacgiscriptthatiscalledwheneverauser

    selectsIndex

    Selected

    Text

    or

    Index

    Page.

    8

  • 8/12/2019 AutoWeb_Guide_2.0a

    9/45

    Memex Technology Ltd A Guide to AutoWeb

    Name Details

    config.db Thedatabasethatcontainsinformationonwhatsites

    getsite.plshouldindex.

    databases Thisdirectorycontainsallthedatabaseswherethe

    indexedpagesarestored.

    dbconfigs Thisdirectorycontainsthedatabaseconfigs.

    images/memexbar.bmp Thisbitmapisanimagelistforthetoolbar.

    install Theinstallscriptfortheserverinstallation.

    mirror ThisdirectorycontainsthemirroredWebpages.

    spider.cfg ThisistheconfigfileforAutoWeb.

    locales/EN.loc

    Englishlocale

    file.

    perlmodules/Config/General.pm Requiredperlmodule.

    perlmodules/Config/General/

    Extended.pm

    Requiredperlmodule.

    perlmodules/Config/General/

    Interpolated.pm

    Requiredperlmodule.

    perlmodules/File/Basename.pm Requiredperlmodule.

    perlmodules/File/CheckTree.pm Requiredperlmodule.

    perlmodules/File/Compare.pm Requiredperlmodule.

    perlmodules/File/Copy.pm Requiredperlmodule.

    perlmodules/File/DosGlob.pm Requiredperlmodule.

    perlmodules/File/Find.pm Requiredperlmodule.

    perlmodules/File/Path.pm Requiredperlmodule.

    perlmodules/File/Spec.pm Requiredperlmodule.

    perlmodules/File/stat.pm Requiredperlmodule.

    perlmodules/File/Spec/Functions.pm Requiredperlmodule.

    perlmodules/File/Spec/Mac.pm Requiredperlmodule.

    perlmodules/File/Spec/OS2.pm Requiredperlmodule.

    perlmodules/File/Spec/Unix.pm Requiredperlmodule.

    perlmodules/File/Spec/VMS.pm Requiredperlmodule.

    perlmodules/File/Spec/Win32.pm Requiredperlmodule.

    9

  • 8/12/2019 AutoWeb_Guide_2.0a

    10/45

    Memex Technology Ltd A Guide to AutoWeb

    Installation prerequisitesBeforeyoucaninstalltheAutoWebserver,yoursystemmustcontain:

    Oneofthefollowingoperatingsystems:

    SunSolaris10

    RedHatEnterpriseLinux4

    MicrosoftWindowsServicesforUNIX3.5

    Perl5.0orgreater

    MemexIntelligenceEngine(MIE)6.0

    Apache2HTTPserver.

    ApachebeconfiguredtorunastheMemexadministratoruser.

    ToconfigureApache2torunastheMemexadministratoruser:

    ChangetothedirectorywhereApacheshttpd.conffileislocated.Forexample:

    cd /usr/local/apache2/conf

    Editthehttpd.conffilewithaplaintexteditor,suchasvi.

    Locatethesectionoftheconfigurationfilethatspecifiestheuseraswhomthehttpd

    servicewillrun.Forexample,toforceApache2torunastheusermxadmininthe

    groupmxadmins,addorchangetheUserandGrouplines:

    User mxadmin

    Group mxadmins

    ApacheslogfilesmustbewritablebytheMemexadministratoruser(typicallymxadmin

    ormxroot).

    TodothisonSolarisorLinux:

    suasroot

    ChangetheownershipofthedirectorywhereApacheslogfilesreside.Thelocation

    ofthelogfilesisspecifiedinApacheshttpd.conffile.Thedirectoryanditscontents

    shouldbeownedbytheMemexadministratoruser.Forexample:

    chown -R mxadmin:mxadmins /var/apache2/logs

    TodothisonWindowsSFU:

    FromanSFUcommandconsole,suasAdministrator.

    ChangetheownershipofthedirectorywhereApacheslogfilesreside.Thelocation

    ofthelogfilesisspecifiedinApacheshttpd.conffile.Thedirectoryanditscontents

    shouldbeownedbytheMemexadministratoruser.Forexample:

    chown -R SERVERNAME+mxadmin:SERVERNAME+mxadmins

    /usr/local/apache2/logs Torunthegetsite.plscriptasacronjob(seeMonitoringWebsitesonpage33),theMemex

    administratoraccount(usuallymxadminormxroot)musthaveahomedirectory.

    SFU requirements

    IfyouareinstallingonSFU,youmustfirstinstallthefollowingsoftwarepackages:

    Package name Description

    httpd Apache2HTTPServer

    lynx LynxWebbrowserforterminals

    10

  • 8/12/2019 AutoWeb_Guide_2.0a

    11/45

    Memex Technology Ltd A Guide to AutoWeb

    zlib Zlibdatacompressionlibrary

    ThesepackagesareavailablefromtheSFUToolsWarehouseWebsite:

    http://www.interopsystems.com/tools/warehouse.htm

    Toinstallthesepackages,firstdownloadandinstallthepackageinstallerthatisavailableasa

    shellscriptfromthesameWebsite.Youcanthenissuesimplecommandsfromashell

    consolewindowthatusethepackageinstallertodownloadandinstallthesoftwarepackages

    andalltheirdependencies.Forexample,toinstallApache2,runthecommand:

    pkg_update L ht t pd

    Formoreinformation,seetheSFUToolsWarehouseWebsite.

    Installing the server componentsThemethodinstallingtheAutoWebservercomponentsvariesdependingonwhetheryour

    MIEwasinstalledaspartofaMemexSeriesVIServerinstallation.Ifyouareadding

    AutoWebtoaMemexSeriesVIsystem,usetheautoinstallermethoddescribedhere.

    Otherwiseusethetarfilemethodonpage12.

    Installing using the auto-installer

    TheautoinstallerisavailableforWindows,LinuxandSolaris.YoumusthaveaMemex

    SeriesVIServersetuptobeabletousetheAutoWebautoinstaller.

    Using the auto-installer on Windows

    1. Locatetheautoweb_windows.exefileinWindowsExplorer.

    2. RightclickthisfileandchooseRunAs.

    3. SelectThefollowinguserandenter\Administrator .

    4. EnterthepasswordforAdministratorandclickOK.

    5. Followthesetupinstructionsonscreen:

    MemexrecommendsleavingthedestinationdirectoryasC:\SFU\opt\memex

    Inmostcasesyoucanleavethehostnameandportsettingsattheirdefaultvalues:

    Hostname:l ocal host

    Port:9001

    EnterthenameandpasswordofanMIEsuperuser.TocheckthenamesofcurrentMIE

    superusers,lookatthevaluesofthesuperuserselementinthememexsvr.xmlfile

    (usuallylocatedin/opt/memex/etc).

    6. Asinstructedattheendoftheautoinstallationprocess,addanIncludestatementto

    Apacheshttpd.conffile.

    Forexample,fromanSFUshell,runthecommand:

    11

  • 8/12/2019 AutoWeb_Guide_2.0a

    12/45

    Memex Technology Ltd A Guide to AutoWeb

    echo " I ncl ude / opt / memex/ aut oweb/ conf i g/ apache2. conf " >>/ usr / l ocal / apache2/ conf / ht t pd. conf

    7. Start,orrestart,ApacheWebserver:

    / usr / l ocal / apache2/ bi n/ apachectl r est ar t

    Using the auto-installer on Solaris or Linux

    1. Logontotheserverasthelocalrootuser.

    2. Locatetheautoweb_linux.shinstallscriptandrunitbytypingthecommand:

    sh autoweb_linux.sh

    3. Followthroughthesetupinstructionsonscreen.(Thedefaultvaluesareusuallycorrect

    foreach):

    Memexrecommendsleavingthedestinationdirectoryas/opt/memex

    Enterthe

    host

    name

    and

    port

    number

    of

    your

    Memex

    Series

    VI

    Server.

    The

    default

    valuesoflocalhostand9001areusuallycorrect,butyoucanmodifythem.Ifyou

    areinstallingAutoWebonaserverotherthantheonethathostsyourMemexSeries

    VIsetup,youmustalsoprovidetheportnumberforthatserversMIE.Otherwise,

    enterthesamevalueasyouenteredforthepreviousportnumber.

    EnterthenameandpasswordofanMIEsuperuser.Tocheckthenamesofcurrent

    MIEsuperusers,lookatthevaluesofthesuperuserselementinthememexsvr.xml

    file(usuallylocatedin/opt/memex/etc).

    Note If any of the values you enter for the previous two steps are incorrect, the installerwill display an error and prompt you to re-enter the correct values.

    4. Asinstructedattheendoftheautoinstallationprocess,addanIncludestatementto

    Apacheshttpd.conffile.Forexample,runthecommand:

    echo "Include /opt/memex/autoweb/config/apache2.conf" >>

    /httpd.conf

    Where is a path such as /etc/apache2.

    5. Start,orrestart,theApacheWebserver:

    /bin/apachectl restart

    Where is a path such as /usr/apache2.

    Installing using the tar file

    ThismethodofinstallationshouldonlybeusedifyourMemexserverwassetupmanually

    andnotwiththeMemexSeriesVIautoinstaller.IfyouareunsurewhichtypeofMemexset

    upyouhave,[email protected].

    Note You must install the AutoWeb server components as the Memex administratoraccount. For example, mxadminor mxroot.

    12

    mailto:[email protected]:[email protected]
  • 8/12/2019 AutoWeb_Guide_2.0a

    13/45

  • 8/12/2019 AutoWeb_Guide_2.0a

    14/45

    Memex Technology Ltd A Guide to AutoWeb

    11. ConfigureyourwebserversothattheimagessubdirectoryisvisibleasaWeb

    subdirectory.

    Todothis,addalinetoApacheshttpd.conffile,suchas:

    Al i as / aut oweb- i mages/ / opt / memex/ aut oweb/ i mages/

    Note:

    Thenamethatyougivetothisaliaswillhaveanimpactonthei mgl st entrywithin

    thespider.cfgfile.

    12. AddacgibindirectorytoyourWebservercalled/ aut oweb- bi n/.Thisdirectory

    mustbealiasedtothecgibinsubdirectorywithintheautowebdirectory.

    Todothis,addalinetoApacheshttpd.conffile,suchas:

    Scr i pt Al i as / aut oweb- bi n/ / opt / memex/ aut oweb/ cgi - bi n/

    13. MakeanoteofthefullURLlocationofthisScriptAlias.

    YouenterthisURLwhenconfiguringtheAutoWebclienttoolbar.

    14.

    Addadirectory

    to

    your

    Web

    server

    that

    points

    to

    the

    cgi

    binsubdirectory

    within

    the

    autowebdirectory.

    Todothis,addthefollowinglinestoApacheshttpd.conffile:

    Al l owOver r i de NoneOpt i ons NoneOr der al l ow, denyAl l ow f rom al l

    Where isthelocationofyourAutoWebinstallation,

    typically/opt/memex/autoweb.

    YoumustalsoaddanotherdirectorytoyourWebserverforeachofthemirrorand

    imagesdirectoriessimilartotheoneshownaboveforthecgibindirectory.For

    example:

    Al l owOver r i de NoneOpt i ons NoneOr der al l ow, denyAl l ow f rom al l

    and

    Al l owOver r i de NoneOpt i ons NoneOr der al l ow, denyAl l ow f rom al l

    14

  • 8/12/2019 AutoWeb_Guide_2.0a

    15/45

    Memex Technology Ltd A Guide to AutoWeb

    Creating extra databases

    Onesampledatabaseiscreatedaspartoftheinstallationprocess.Thesampledatabaseis

    calledwebarchive.ThedirectoryforAutoWebdatabasesis:/opt/memex/autoweb/databases.

    Youcancreateextradatabasesbyusingthens_createcommandfollowedbythemkphonetic

    command.Forexample:

    ns_create -c /opt/memex/autoweb/dbconfigs/config.archive

    -n 8192 /opt/memex/autoweb/databases/mynewdb

    mkphonetic /opt/memex/autoweb/databases/mynewdb

    SeetheMemexIntelligenceEngineAdministrator sGuideformoreinformationonthens_create

    andmkphoneticutilities.

    Setting up the AutoWeb configuration filespider.cfgistheconfigurationfileforAutoWeb.Thistableliststheentriesthatthe

    configurationfilemustcontain.Thedefaultspider.cfgfileisshownonpage17.

    Name Details

    installpathTheinstallationdirectoryoftheAutoWebserver.Thisissetautomatically

    bythe

    install

    script.

    Forexample:/ opt / memex/ autoweb

    localeThelanguagelocaletousefortheserverresponsestotheMemextoolbar.

    Thismustbesettomatchoneofthefilesinthelocalesdirectoryinthe

    installationpath.

    Forexample:EN

    mirrorurlTheURLforthemirrordirectory.Thismustcontainthefulldomainname

    andthealiasthatyougaveforthemirrordirectory.

    Forexample: ht t p: / / ser ver . domai n. com/ aut oweb- mi r r or

    httracklibThepathtothelibfileforHTTrack.Forexample:/ opt / memex/ aut oweb/ bi n

    httrackThepathtotheHTTrackexecutable.

    Forexample:/ opt / memex/ aut oweb/ bi n/ ht t r ack

    optsTheoptionsthatgetsite.plusestocallHTTrack(seeHTTrackoptionsand

    robots.txtonpage17).

    Forexample:- n - %e0

    stdoptsMoreoptionsthatgetsite.plusestocallHTTrack.

    Forexample: - I 0 - Qq - - assume cf m=t ext / ht ml , php=t ext / html

    - X0 - %F ""

    15

  • 8/12/2019 AutoWeb_Guide_2.0a

    16/45

    Memex Technology Ltd A Guide to AutoWeb

    Name Details

    appendThepathtothens_appendutility.

    Forexample:/ opt / memex/ mi e/ bi n/ ns_append

    decode

    The

    path

    to

    thedecode

    utility,

    Forexample:/ opt / memex/ mi e/ bi n/ decode

    configdbThepathtotheconfigdatabaseforgetsite.pl.

    Forexample:/ opt / memex/ aut oweb/ conf i g. db

    lynxThepathtothelynxutilityandtheparametersthatmustbepassed.

    Forexample:/ opt / memex/ aut oweb/ bi n/ l ynx cf g=" / opt / memex/ aut oweb/ bi n/ l ynx. cf g"

    domainThewebserverdomain.

    Forexample:server . domai n. com

    imglst Thepaththatwillbeaddedtothedomaintoretrievetheimagelistforthetoolbar.Thefirstpartofthismustbethenamethatyougavetothealias

    forthe/imagesdirectory.

    Forexample:/ aut oweb- i mages/ memexbar . bmp

    cgi-binThepaththatwillbeaddedtothedomaintoaccessthecgibinfor

    AutoWeb.Thismustbethenameofthealiasthatyougaveforthecgibin

    directory.

    Forexample:/ aut oweb- bi n/

    pageoptsTheoptionsusedinthecallfromindexpage.pltoHTTrack.

    Forexample: - %P0 C0 - I 0 - %Q - n - Qq - d - - assume

    cf m=t ext / ht ml , php=t ext / html - X0 - %F ""

    logfileThelocationofthelogfileforAutoWeb.Ifthisentrydoesnotexist,nolog

    fileiscreated.

    Forexample:/ opt / memex/ l ogs/ cr awl er l og. t xt

    filtertypesAlistofthefiletypesthatAutoWebwillnotwritearecordfor.

    Forexample:r a| r am| j pg| gi f | pbm| mov| avi | wmv| css| pdf | ps| j s| xml | r df

    lockfileThelockfilethatisusedtopreventget si t e. pl fromrunningmorethan

    once.

    Forexample:/ t mp/ aut owebl ock

    notrenamedAlistofthefiletypesthatHTTrackdoesnotrenameashtml.

    Forexample:ht ml | ht m| t xt

    imbaseTheinstallationdirectoryoftheMemexPatriarchsoftwareontheserver.

    ThisentryisoptionalandisonlynecessaryifyouwanttouseAutoWeb

    fromwithinMemexPatriarch.

    Thisparametershouldusuallybesetto:/ opt / memex/ i m

    16

  • 8/12/2019 AutoWeb_Guide_2.0a

    17/45

    Memex Technology Ltd A Guide to AutoWeb

    Name Details

    rolloverThenumberofdaysbeforethemirrordirectoryisrolledover.

    Rollingoverthemirrordirectoryinvolvescreatinganewsubdirectoryin

    thelocationspecifiedbythemirrorurlsetting.Ifyouleavethisatthe

    defaultof7,anewmirrorsubdirectoryiscreatedevery7daysforstoringWebpagesin(2007001,2007002andsoon).

    Toturnoffthisprocess,setthevalueto0,althoughthisisnot

    recommended.Thedefaultandrecommendedvalueintheprovidedfile

    is7.

    Note You use different configuration file variables to specify the HTTrack options,depending on how you are running AutoWeb:

    If you are running the AutoWeb toolbar, use the pageopts variable to specify theHTTrack options.

    If you running AutoWeb as a cronjob via getsite.pl use the StdOpts variable

    to specify the HTTrack options.

    The default spider.cfg file

    #Conf i g f i l e f or I nt el l i gence Mi r r ori nst al l path / opt / memex/ aut owebmi rr orurl ht t p: / /localhost/ aut oweb- mi r r orht t r ackl i b / opt / memex/ aut oweb/ bi nht t r ack / opt / memex/ aut oweb/ bi n/ ht t r ackopts - n - %e0 - A32000st dopt s - I 0 - Qq - - assume cf m=t ext / ht ml , php=t ext / ht ml - X0 - %F ""append / opt / memex/ mi e/ bi n/ ns_append

    decode / opt / memex/ mi e/ bi n/ decodeconf i gdb / opt / memex/ aut oweb/ conf i g. dbl ynx / opt / memex/ aut oweb/ bi n/ l ynx -cf g="/ opt / memex/ aut oweb/ bi n/ l ynx. cf g"domai n localhosti mgl st / autoweb- i mages/ memexbar. bmpcgi - bi n / aut oweb- bi n/pageopt s - %P0 - C0 - I 0 - %Q - n - Qq - d - - assumecf m=t ext / ht ml , php=t ext / ht ml - X0 - %F " "l ogf i l e / opt / memex/ aut oweb/ crawl er l og. t xtf i l t er t ypes r a| r am| j pg| gi f | pbm| mov| avi | wmv| css| pdf | ps| j s| xml | r dfl ockf i l e / t mp/ spi der l ocknot r enamed html | htm| t xtl ocal e EN

    i mbase / opt / memex/ i mr ol l over 7

    HTTrack options and robots.txt

    Arobots.txtfileisstoredintherootofmostWebservers.Thisfilealertscrawlersandweb

    spiders,suchasAutoWeb,astowhichpagestheyshouldignorewhenretrievingpagesfrom

    theremoteWebserver.

    TheoriginalspecificationofthisstandardandtheIETFdraftareavailablefromthefollowing

    sites:

    http://www.robotstxt.org/wc/norobots.html

    17

  • 8/12/2019 AutoWeb_Guide_2.0a

    18/45

    Memex Technology Ltd A Guide to AutoWeb

    http://www.robotstxt.org/wc/norobotsrfc.html

    Becauserobots.txtrestrictsthefilesthatcanbedownloadedbywebspiders,ithasanimpact

    ontheAutoWebserversoftwareanditsabilitytotrackandstoreWebpages.

    AutoWebuses

    HTTrack

    software

    to

    retrieve

    remote

    Web

    pages.

    If

    required,

    you

    can

    configureHTTracktoeitherfolloworignorethedirectivesintherobots.txtfile.Youdothis

    bychangingtheopt ssettinginthespider.cfgfile.Formoreinformation,seeAppendixC

    HTTrackoptionsonpage41.

    Upgrading to AutoWeb 2.0ThefollowingseriesofinstructionsmustbeperformedtoupgradeanAutoWeb1.3

    installationtoAutoWeb2.0.IfanupgradeisbeingperformedfromAutoWeb1.0or1.1the

    configurationmustbeupgradedtoAutoWeb1.3beforethefollowingstepscanbeapplied.Instructionsforupgradingtoversion1.3aregivenintheappendixonpage43.

    Unpack the installation package

    UnpacktheAutoWeb2.0installationpackageinatemporarylocation.Forexample:

    tar -xvf mxwasvr--.tar

    Updating the configuration database

    MemexAnalyst

    config.db

    database

    IfyouareusingMemexAnalystforadding/editingconfigurationrecordsforAutoWeb,two

    newfieldsmustbeaddedtotheconfigfilefortheconfig.dbdatabase.Thepathtothisfileis

    typically/opt/memex/autoweb/config.db/config.Useaplaintexteditor,suchasvi,toedit

    thisfile,addingthefollowingtwolinestotheendofthefile:

    f i el d: 6 i ndex xxi ndex ""f i el d: 7 pr i or i t y xxpr i or i t y " "

    Note If the field numbers 6 and 7 are currently used by other fields, use the next availablehighest numbers that are not currently in use.

    MemexPatriarchWebConfigDatabase

    IfyouuseMemexPatriarchforadding/editingconfigurationrecordsforAutoWeb(thatis,if

    config.dbisasymboliclinktotheMemexPatriarchWebConfigdatabase),youmustadd

    indexandpriorityfieldstotheWebConfigdatabasedefinition.DothiswithinMemex

    Patriarch,usingEntityManager.SeetheMemexPatriarchonlinehelpfordetailsofhowto

    addnewfields.

    TheMemexPatriarchformforWebConfigrecords(and,optionally,theformfor

    WebArchiverecords)shouldbereplacedbytheformssuppliedintheim13autoweb/forms

    directoryofthedistribution.Forexample:

    cp i m13aut oweb/ f orms/ WebConf i g. f orm / opt / memex/ i m/ CS/ f i l es/ f orms

    18

  • 8/12/2019 AutoWeb_Guide_2.0a

    19/45

    Memex Technology Ltd A Guide to AutoWeb

    cp i m13aut oweb/ f orms/ WebAr chi ve. f orm / opt / memex/ i m/ CS/ f i l es/ f orms

    TwonewpicklistsshouldbeaddedwithinListManagementtotheWebConfigdatabase

    definitionfortheindexandpriorityfields.IndexshouldhavethevaluesYESandNO.Priority

    shouldhavethevaluesHIGH,MEDIUMandLOW.

    SeetheMemexPatriarchonlinehelpfordetailsoncreatingpicklists.

    Note These picklist files are supplied with the AutoWeb distribution inim13autoweb/picklists .

    Run the upgrade scripts

    WithinthedirectorythattheAutoWeb2.0installationpackagewasunpacked,enterthe

    followingcommand:

    sh upgr ade- scr i pt s

    Where istheinstallationdirectoryoftheexisting

    AutoWeb1.3software.Thisisnormally/opt/memex/autoweb.

    19

  • 8/12/2019 AutoWeb_Guide_2.0a

    20/45

    Chapter 2Installing the AutoWeb client

    Installing the toolbar

    ToinstalltheAutoWebtoolbar:

    1. InWindowsExplorer,browsetothelocationofthesuppliedAutoWeb.exefileforthe

    clientapplication.

    2. DoubleclickAutoWeb.exe.

    ThislaunchestheAutoWebInstallShieldprogram.

    3. ClickYestoacceptthelicenseagreement.

    ThisdisplaystheChooseDestinationLocationpage.

    4. Browsetothelocationwhereyouwanttoinstallthefiles,andclickNext.

    TheInstallShieldprograminstallstheAutoWebfilesanddisplaysaconfirmation

    messagewhentheinstallationiscomplete.

    5. ClickFinishtoacknowledgethemessage.

    Configuring the toolbar

    AfterinstallingtheAutoWebtoolbar,youneedtoopenInternetExplorerandmakesurethat

    thetoolbarisnowavailable.

    Ifthetoolbarisnotvisible,chooseView>Toolbars>AutoWeb.ThisaddstheAutoWeb

    toolbartoInternetExplorer.

    Thetoolbarshouldlooklikethis:

    ToconfiguretheAutoWebtoolbar:

    1.

    Clickthe

    arrow

    beside

    the

    AutoWeb

    button

    and

    choose

    Configuration

    from

    the

    drop

    downlist.

    20

  • 8/12/2019 AutoWeb_Guide_2.0a

    21/45

    Memex Technology Ltd A Guide to AutoWeb

    ThisdisplaystheConfigurationdialogbox.

    2. EntertheURLofthecgibindirectoryonthewebserverwheretheAutoWebserver

    softwareisinstalled.Typically,thisis:http://server.domain/autowebbin/

    Forexample:http://achilles.memex.com/autowebbin/

    YoucancheckthisvaluebylookingfortherelevantScriptAliasentryinApaches

    httpd.conffile(orinthe/opt/memex/autoweb/config/apache2.conffileforan

    installationwithMemexSeriesVIServer).

    3. ClickOK.

    ThisenablestheAutoWebtoolbar.Allthetoolbaroptionswillnowbeavailable.

    Configuring the toolbar from the Windows registry

    IfyouareinstallingtheAutoWebtoolbaronasignificantnumberofmachines,orifyouwant

    torestrictuseraccesstotheConfigurationoption,youcanconfigurethetoolbarviaaspecific

    registryfileautoweb.reg.ThisfileissuppliedbyMemexalongsidetheclientinstallation

    file.

    Youspecifythefollowingsettingsintheautoweb.regfile:

    URL

    ThefullURLofthecgibindirectoryonthewebserverwheretheAutoWebserver

    softwareisinstalled.

    Conf i gDi sabl ed

    ADWORDvalueintheregistry.Setthisto1(oranynonzerovalue)todisablethe

    AutoWebtoolbarsConfigurationmenuoption.

    Forexample,atypicalautoweb.regfilelookslikethis:

    REGEDIT4

    [HKEY_LOCAL_MACHINE\SOFTWARE\Memex Technology Ltd\AutoWeb]

    "URL"="http://server.domain/autoweb-bin/"

    "ConfigDisabled"=dword:00000000

    DoubleclickthisfiletoapplythechangestotheWindowsregistryofthelocalcomputer.

    Note These settings apply to all user accounts on the computer. The changes are appliedto Internet Explorer the next time it is started.

    21

    http://server.domain/autoweb-bin/http://server.domain/autoweb-bin/http://server.domain/autoweb-bin/http://server.domain/autoweb-bin/http://server.domain/autoweb-bin/http://achilles.memex.com/autoweb-bin/http://achilles.memex.com/autoweb-bin/http://achilles.memex.com/autoweb-bin/http://achilles.memex.com/autoweb-bin/http://server.domain/autoweb-bin/
  • 8/12/2019 AutoWeb_Guide_2.0a

    22/45

    Memex Technology Ltd A Guide to AutoWeb

    Toaddafurtherlevelofsecurity,youcanplacesecuritypermissionsontheseregistrykeysto

    preventthembeingchanged.Thisstopsusersfromreconfiguringthetoolbarthemselves.For

    moreinformationonsettingpermissionsforregistrykeys,seeyourMicrosoftWindows

    documentation.

    How the toolbar worksImplementation

    TheAutoWebtoolbarisimplementedasanativeDeskBandcomponentforInternetExplorer

    usingVisualC++.ThisrequirestheMXAutoWeb.dllfiletoberegisteredoneachclient

    machine.Afterthelibraryisregistered,userscandisplaythetoolbarbyaccessingInternet

    ExplorerandselectingView>Toolbars>MemexAutoWebToolbar.

    Configuration

    Thetoolbarconfigurationiscontrolledbythefollowingregistrykey:

    HKEY_LOCAL_MACHINE/Software/Memex Technology Ltd/AutoWeb

    ThiskeyisheldunderthestringvalueURL,whichcontainsthebaseURLtothecgibin

    directoryonthewebservercontainingtheCGIscripts.

    Processing index requests

    WhenauserclicksIndexPageorIndexSelectedTextonthetoolbar,AutoWebsendsan

    HTTPrequesttotheIndexPage.plPerlCGIscript,locatedwithinthecgibindirectoryonthe

    server.

    Thisrequestcontainsthefollowingparameters:

    TheMemexdatabasewheretheindexedtextwillbestored

    The

    keywords

    to

    add

    to

    the

    database

    record

    The(selected)textfromthepage

    Anindicationastowhethertheuserisindexingtheentirepageorjustselectedtext

    TheWebpagesURL

    IndexPage.plthencallsHTTrackfortheURL(thiscallisruninthebackground).HTTrack

    attemptstocreateamirrorofthatpage.

    ThiscalltoHTTrackcontainsaparameterspecifyingwhethereachindexedfilewillcontaina

    timestampinthefilename.HTTrackinturncallsaddpagefile.pl,whichcomparesthenew

    indexedfilewiththemostrecentversiononthelocalserver.

    Ifthefilesarethesame,thenewversionisdeletedandreplacedwithasymboliclinkto

    themostrecentfile.

    Ifthefilesaredifferent,thenewfilebecomesthemostrecentversionandisusedfor

    anysubsequentcomparisons.

    AftercompletingthecalltoHTTrack,IndexPage.plwritesarecordintothespecified

    databasecontaining:

    TheoriginalURL

    TheURLofthemirror

    Thekeywords

    The(selected)textfromthepage

    22

  • 8/12/2019 AutoWeb_Guide_2.0a

    23/45

  • 8/12/2019 AutoWeb_Guide_2.0a

    24/45

    Chapter 3Using AutoWeb with Memex Patriarch

    Note This chapter contains information on configuring AutoWeb to be used with MemexPatriarch on a Memex system that was manually installed. If your system is aMemex Series VI Server that was installed using the provided auto-installer (i.e. youuse Memex Patriarch to administer your system), you can skip this chapter andcontinue readingChapter 4Using AutoWebon page 31.

    AutoWebisdesignedtointegratewithMemexPatriarchandMemexAnalyst.However,you

    mustperformsomeextrainstallationandsetuptaskstouseAutoWebwithinMemex

    Patriarch.

    Important You can use eitherMemex Patriarch orMemex Analyst for choosing the Websites you want AutoWeb to monitor. However, you cannot configure AutoWebfrom bothapplications. The steps described in this section enable configurationfrom within Memex Patriarch. This will disable configuration from within MemexAnalyst. You will still be able to viewthe configuration records in MemexAnalyst, but you will only be able to add or edit configuration records fromMemex Patriarch.

    Installation tasks

    Memex Intelligence Engine

    ForMemexPatriarchandAutoWebtoworktogether,MIE6.0mustbeinstalledonallthe

    serversthatwillbeusedtohostbothMemexPatriarchandAutoWeb.

    Notes You do not need to place Memex Patriarch and AutoWeb on completelyseparate physical machines. A single MIE instance can host both the MemexPatriarchand AutoWeb databases.

    If your system uses multiple physical servers, all the physical machines mustshare the same secret file to allow for certificate authentication.

    FordetailsonhowtosetuptheMIEonyourservers,readtheMIE6.0InstallationGuide.

    Memex Patriarch

    TheMemexPatriarchserversidecomponentscanbeinstalledintwoways:

    1. UsingtheMemexSeriesVIServerautoinstaller

    2. UsingthePerlbasedinstaller

    24

  • 8/12/2019 AutoWeb_Guide_2.0a

    25/45

    Memex Technology Ltd A Guide to AutoWeb

    ThePerlbasedinstallerprovidesawaytospecifymanyoftheconfigurationoptionsduring

    theinstallationprocess,whereastheautoinstallerprovidesaquickwaytoinstallaprebuilt

    installation.

    ThissectionrelatestoMemexserverinstallationsdoneusingthePerlbasedinstaller.This

    installerwill

    also

    be

    used

    to

    install

    the

    two

    AutoWeb

    databases

    for

    Memex

    Patriarch.

    FormoreinformationonthePerlbasedinstallerseetheMemexSeriesVIServerInstallation

    Guide:PartIIPatriarchComponents.

    AutoWeb databases for Memex Patriarch

    AutoWebcontainstwoMemexdatabasedefinitionsthatyoucanusetoinstallAutoWeb

    databasesforMemexPatriarch.ThesedatabasesallowyoutosearchandcontrolAutoWeb

    frominsideMemexPatriarch

    Toenablethesedatabasedefinitions,copytheim13autowebdirectoryintotheiminstall

    directory(which

    was

    created

    when

    the

    Perl

    based

    installer

    was

    used

    to

    install

    the

    Memex

    Patriarchservercomponents).Forexample,

    cp - R / opt / memex/ aut oweb/ i m13aut oweb / opt / memex/ i m/ i m- 2. 0a- 105- vani l l a-i nter i x/ i m- i ns tal l

    Important If you deleted the im-installdirectory after installing the Memex Series VIServer, you will no longer have the Perl-based installer. You need this toproceed with this installation procedure. Contact Memex Customer Servicesand request a copy of the tar file containing the Perl-based installer for theMemex Patriarch server components.

    The installer for the Memex Patriarch server components mustbe run on the

    physical machine that hosts the Memex configuration server. If AutoWeb isinstalled on a machine that is notthe configuration server, you must copythe AutoWeb database definitions to the configuration server, by transferringthe im13autowebdirectory across the network to the physical machine thatis hosting the configuration server.

    BeforeyoucaninstalltheAutoWebdatabasedefinitions,youneedthefollowinginformation

    aboutyourMemexSeriesVIServersetup:

    ThehostnameandportnumberfortheMemexIntelligenceEnginethatyouwilluseto

    accesstheAutoWebdatabases

    TheprefixandnameofthelogicalserverthatwillhosttheAutoWebdatabases

    Youwilladdthisinformationtotheinstallerssetup.xmlfiletospecifywheretheAutoWeb

    databaseswillbecreated.

    Editing the setup.xml file

    Whenyouhavecopiedtheim13autowebdirectorytotheiminstalldirectory,youmust

    modifythesetup.xmlfilewithintheiminstall/im13autowebdirectory.Thisfilecontainsthe

    databasedefinitionsforthetwonewAutoWebdatabases:WebConfigandWebArchive.It

    alsodefinesanewlogicalservernamedAutoWeb(prefixAW).

    25

  • 8/12/2019 AutoWeb_Guide_2.0a

    26/45

    Memex Technology Ltd A Guide to AutoWeb

    IfyouwanttocreatetheAutoWebdatabasesonaremoteserver,youmustedittheattributes

    forthehost element,specifyingtheserverwherethenewAutoWebdatabaseswillbe

    created.Todothis,changetheattributesto:host name="hostname"por t ="number" .

    Forexample:

    Alternatively,tocreatetheAutoWebdatabasesonthesamephysicalmachineastheMemex

    Patriarchconfigurationserver,leavethehost attributeas:

    Installing the AutoWeb databases

    Aftereditingtheset up. xml file,youmustrunthePerlbasedinstallerforMemexPatriarch,

    toinstallthenewAutoWebdatabasesandlogicalserver.

    ToinstalltheAutoWebdatabasesandserver:

    1. Changetotheiminstalldirectoryontheconfigurationserver.Forexample:

    cd /opt/memex/im/im-2.0a-105-vanilla-interix/im-install

    2. Runthefollowingcommand:

    perl install.pl c -i -m -x -p -fautoweb/im13autoweb

    Where:

    istheprefixofthelogicalserverusedastheconfigurationserver(usually

    CS).

    isthedirectorywheretheMemexPatriarchserverside

    componentsareinstalled(usually/opt/memex/im).

    isthedirectorywheretheMIEisinstalled(usually/opt/memex/mie).

    isthepathtotheMIEconfigurationfile (usually

    /opt/memex/etc/memexsvr.xml).

    istheTCPportonwhichthelocalMIElistensforconnections.

    Forexample:

    perl install.pl -c CS -i /opt/memex/im -m /opt/memex/mie -x/opt/memex/etc/memexsvr.xml -p 9001 -f autoweb/im13autoweb

    3. Whenthedetailsoftheinstallationaredisplayed,enterytoconfirmthatyouwantto

    continuewiththeinstallation.

    4. EntertheusernameandpasswordoftheMemexPatriarchsuperuser.

    Thescriptcompletestheinstallation.

    26

  • 8/12/2019 AutoWeb_Guide_2.0a

    27/45

    Memex Technology Ltd A Guide to AutoWeb

    Configuration tasksToconfigureAutoWebtoworkwithMemexPatriarch,youmustupdateAutoWebtousethe

    newentitiesthathavebeencreated.

    Modifying the spider.cfg file

    ThistaskismandatoryifyouwanttouseAutoWebwithMemexPatriarch.

    Thespider.cfgfileislocatedintheaut owebdirectory.Thefilecontainsthesettingi mbase.

    YoumusteditthissettingtopointtothedirectorywhereMemexPatriarchisinstalled.

    Forexample,ifMemexPatriarchisinstalledin/opt/memex/im,youwouldchangethe

    spi der . cfgfilesettingto:

    imbase /opt/memex/im

    Important You must modify spider.cfgbefore you make any of the other changesdescribed in this section. If you do not make this change, AutoWeb will not beable to detect that it is inserting data into an Memex Patriarch database, andthe resulting records will be inaccessible from the client software.

    Linking to the WebConfig database

    Thegetsite.plscript,whichisusedtoindexWebpagesautomatically,isconfiguredusing

    recordsinalegacyformatdatabasecalledconfig.db.Youcanaddandeditrecordsinthis

    databaseusingMemexAnalyst.However,toaddoreditconfigurationrecordsfromwithinMemexPatriarchyoumustusethenewMemexPatriarchWebConfigdatabasethatwas

    createdwhenyouinstalledtheim13autowebsetup.

    Theconfig.dbdatabaseisstoredintheAutoWebinstallationdirectory.Forexample,if

    AutoWebhasbeeninstalledin/opt/memex/autoweb,thepathtothisdatabaseis

    /opt/memex/autoweb/config.db.

    Important You can only configure AutoWeb from eitherMemex Patriarch or MemexAnalyst. You cannot configure AutoWeb from both applications.

    Creating the symbolic l nki

    Creatingasymboliclinkfromtheconfig.dbdatabasetothenewWebConfigdatabase,forces

    AutoWebtouseMemexPatriarchsWebConfigdatabase.

    Tocreatethesymboliclink:

    1. AstheMemexadministrativeuser,movetotheAutoWebinstallationdirectory.For

    example:

    cd / opt / memex/ autoweb

    27

  • 8/12/2019 AutoWeb_Guide_2.0a

    28/45

    Memex Technology Ltd A Guide to AutoWeb

    2. Movetheconfig.dbasidebyenteringthefollowingcommand:

    mv config.db config.db.old

    3.Create

    alink

    to

    the

    WebConfig

    database

    by

    entering

    with

    the

    following

    command:

    ln s //databases/WebConfig config.db

    Forexample:

    ln s /opt/memex/im/AW/databases/WebConfig config.db

    AutoWebwillnowusetheWebConfigdatabaseratherthantheconfig.dbdatabase.

    Note If you are upgrading your AutoWeb setup from a previous version, you must makesure that a uniq_idfile is stored in the WebConfigdatabases directory. You can dothis manually, or by adding a record to the database in Memex Patriarch.

    For more information, consult the MIE Administrators Guide.

    Reverting to the legacy database

    If,atalaterdate,youdecidethatyouwouldprefertouseMemexAnalystforconfiguring

    Websitemonitoring,youcanreversetheaboveprocess,deletingthesymboliclinkand

    renamingthe

    config.db.old

    file

    as

    config.db.

    However,

    after

    doing

    this,

    the

    configuration

    databasewillbeemptyandyouwillbeleftwithaWebConfigdatabaseinMemexPatriarch

    thatisnolongerconnectedtoAutoWeb.

    Linking to the WebArchive database

    Bydefault,AutoWebusesalegacyformatdatabasecalledwebarchiveforindexingpages.

    Thisdatabaseislocatedinthethe/opt/memex/autoweb/databasesdirectory.Toconfigure

    AutoWebfromMemexPatriarchyoumustusetheMemexPatriarchWebArchivedatabase

    thatwascreatedwhenyouinstalledtheim13autowebsetup.

    TousetheWebArchivedatabase,youmustcreateasymboliclinktoforceAutoWebtouse

    thisdatabaseratherthanthe/opt/memex/autoweb/databases/webarchivedatabase.

    Creating the symbolic l nki

    YoucreatethesymboliclinktotheWebArchivedatabaseinthesamewayasyoucreatedthe

    symboliclinktotheWebConfigdatabase.

    Tocreatethesymboliclink:

    28

  • 8/12/2019 AutoWeb_Guide_2.0a

    29/45

    Memex Technology Ltd A Guide to AutoWeb

    1. Movetothedatabasessubdirectoryoftheautowebinstallationdirectory.For

    example:

    cd / opt / memex/ aut oweb/ dat abases

    2. CreatealinktotheWebArchivedatabasebyenteringwiththefollowingcommand:

    ln s //databases/WebArchive

    Forexample:

    ln s /opt/memex/im/AW/databases/WebArchive webarchive

    Notes The AutoWeb toolbar will list the WebArchivedatabase by the name of the

    symbolic link usually webarchive. If you are upgrading your AutoWeb setup from a previous version, you must

    make sure that a uniq_idfile is stored in the WebArchivedatabases directory.You can do this manually, or by adding a record to the database in MemexPatriarch.

    For more information on the uniq_idfile, see the Memex Intelligence EngineAdministrators Guide.

    YouwillbeabletouseMemexPatriarchtoviewWebpagesindexedfromtheAutoWeb

    toolbarbysearchingtheWebArchivedatabaseontheAutoWeblogicalserverwithinMemex

    Patriarch.

    Setting up picklistsThisisanoptionaltask.

    InMemexPatriarch,theWebConfigentitycontainsasinglepicklistfielddatabasewhich

    holdsalistofallthedatabasesinAutoWeb.Thislistisnotautomaticallypopulated.You

    shouldupdatethislistwheneveryouaddadatabasetoAutoWeb.

    ForinformationonmodifyingpicklistsinMemexPatriarch,refertotheMemexPatriarchOnlineHelp.

    Adding additional web archivesTheinstructionsinthischapterdescribehowtocreateasingleAutoWebarchivedatabase

    thatisaccessiblefromMemexPatriarch.However,youcanusemultipledatabasestostore

    WebpagesindexedbyAutoWeb.

    29

  • 8/12/2019 AutoWeb_Guide_2.0a

    30/45

    Memex Technology Ltd A Guide to AutoWeb

    ForeachnewdatabaseyouwanttousefromAutoWeb,youmustcreateanewlogicalserver

    inMemexPatriarch,oruseanexistinglogicalserverthatdoesnotcontainanAutoWeb

    database.

    Createthenewdatabase(andlogicalserver,ifrequired)byusingthePerlbasedinstallerfor

    MemexPatriarch

    server

    components.

    This

    example

    shows

    how

    to

    create

    anew

    WebArchive

    databaseonanewlogicalservercalledAutoWeb2,withtheserverprefixZW:

    1. AstheMemexadministrativeuser,copythesuppliedim13autowebdirectory:

    cd /opt/memex/autowebcp -R im13autoweb im13autoweb2

    2. Editthesetup.xmlfilewithinthenewim13autoweb2directory,removingthetwo

    includestatementsandchangingthenameandprefixattributesfortheserverelement,

    ensuringyouuseaprefixthatisnotalreadyusedbyanexistinglogicalserver.

    Note For more information on server prefixes see the topic Use the installer to add alogical server in the Memex Patriarch online help.

    Forexample:

    3. Changetotheiminstalldirectoryandruntheinstallerwiththeim13autoweb2setup.

    Forexample:

    cd /opt/memex/im/im-2.0a-105-vanilla-interix/im-install

    perl install.pl -c CS -i /opt/memex/im -m /opt/memex/mie-x /opt/memex/etc/memexsvr.xml -p 9001-f /opt/memex/autoweb/im13autoweb2

    4. Changetothedatabasessubdirectoryoftheautowebinstallationdirectory.

    5. CreateasymboliclinktothenewWebArchivedatabasebyenteringwiththefollowing

    command:

    ln s /opt/memex/im/ZW/databases/WebArchive webarchive2

    Note Each archive database (or symbolic link) in the databasesdirectory must have aunique name. For example: webarchive1, webarchive2, and so on.

    30

  • 8/12/2019 AutoWeb_Guide_2.0a

    31/45

    Chapter 4

    Using AutoWeb

    AutoWebisautilitythatallowsyoueasilytoaddthetextofaWebpagetoaMemex

    database.Inadditiontothis,whenyouextracttextfromaWebpage,AutoWebcreatesa

    mirroroftheWebpageonalocalserver.YoucanthenuseMemexAnalysttoviewthe

    recordscreatedfromtheWebpagetextandtoviewthemirroredcopyoftheWebpage.

    Selecting a Memex databaseTospecifywheretheWebpagetextwillbestored,chooseadatabasefromtheSelect

    Databasedropdownlist.

    Specifying keywords

    ToassociatekeywordswithanindexedWebpage,typethekeywordsintotheEnter

    Keywordstextbox.

    Indexing Web page textToextractspecifictextfromaWebpage,highlightthetextandthenclicktheIndexSelected

    Textbutton.

    Whenyouclickthisbutton,AutoWebalsomirrorstheentireWebpagetothelocalserver.

    Indexing a Web pageToextractthetextofanentireWebpage,clicktheIndexPagebutton.

    31

  • 8/12/2019 AutoWeb_Guide_2.0a

    32/45

    Memex Technology Ltd A Guide to AutoWeb

    Whenyouclickthisbutton,AutoWebalsomirrorstheentireWebpagetothelocalserver.

    Viewing indexed pagesYoucanuseMemexPatriarchorMemexAnalysttoretrievetheindexedrecords.

    TheindexedrecordforeachWebpagecontains:

    TheURLoftheoriginalpage

    TheURLofthemirroredcopyofthepage

    Thedateandtimethatthepagewasindexed

    Thetext(ortheselectedtext)fromthepage

    Thekeywordsthatareassociatedwiththepage

    IfMemexAnalysthasbeensetuptousetheformsdistributedwiththeAutoWebtoolbar,the

    resultformdisplaysthemirroredcopyofthepagewhenyouviewoneoftherecords.The

    screenshotbelowshowsanexampleofthis.

    32

  • 8/12/2019 AutoWeb_Guide_2.0a

    33/45

  • 8/12/2019 AutoWeb_Guide_2.0a

    34/45

    Memex Technology Ltd A Guide to AutoWeb

    ThefollowingscreenshotshowsanexampleofcreatingaconfigurationrecordfortheMemex

    WebsiteusingMemexPatriarch.

    EntervaluesfortheName,URLandDatabasefieldstospecifywhatyouwanttoindexand

    whereyouwanttostoretheindexedWebpagedata.

    Entervaluesfortheotherfields,asrequired.Thesefieldsaredescribedinthetableonpage35.

    ClickAppendtosavethenewrecord.

    Specifying sites Memex Analyst

    ThefollowingscreenshotshowsanexampleofcreatinganindexrecordfortheMemexWeb

    siteinMemexAnalyst.

    34

  • 8/12/2019 AutoWeb_Guide_2.0a

    35/45

    Memex Technology Ltd A Guide to AutoWeb

    EntervaluesfortheKeywords,SiteToIndexandDatabasefieldstospecifywhatyouwant

    toindexandwhereyouwanttostoretheindexedWebpagedata.

    Entervaluesfortheotherfields,asrequired.Thesefieldsaredescribedinthetablebelow.

    Savethe

    new

    record.

    Fields on the configuration form

    ThefollowingtableexplainsthefieldsontheconfigurationformsusedwithinMemex

    PatriarchandMemexAnalyst.

    Note The default configuration forms have the heading Index Request. This is part of theform design and can be changed, if required. The labelling of fields on the forms canalso be changed as part of the form design. The first two columns in the followingtable show the labels as they appear in the default forms supplied for Memex

    Patriarch (Field MP) and Memex Analyst (Field MA).

    Field MP Field MA Details

    URN ThisfieldispopulatedbyMemexPatriarchwhenyousavetherecord.

    ThefieldisnotincludedonthedefaultformforMemexAnalyst.

    Name Keywords EnterthenameoftheWebsiteyouwanttoindex.Thenameshouldbe

    relevanttothesiteyouwanttoindexastextenteredherecanbeusedas

    keywordswhensearchingforitlater.

    URL SiteToIndex EnterthefullURLoftheWebsiteyouwanttoindex.Ifyouenterthe

    URLofaWebsitewithoutspecifyingaparticularWebpage(for

    example,http://www.yourcompany.com),AutoWebusesthehome

    pageofthesiteasthestartpagefromwhichtoindex.Youcanindexan

    areawithinaWebsitebyspecifyingaparticularpageonasite(for

    example,http://www.youcompany.com/personnel/vacancies.html).

    Indexed Index Thisfieldallowsindexingtobetemporarilyturnedoffbysettingthe

    fieldvaluetoNO.ToresumeindexingsetthevaluetoYES.Thedefault

    valueisYES,soWebsitesforrecordswithnovalueinthisfield(suchas

    recordsfromupgradedversionsofAutoWeb)areindexed.

    Database Database ThisisthenameofthedatabasetowhichindexrecordsfortheWebsite

    aresaved.Thevalueisthenameofthedatabaseasitappearsonthefilesystem,withintheautoweb/databasesdirectory.Thewebarchive

    databaseisthedefaultavailabledatabasecreatedforsavingnewindex

    recordsto.

    Priority Priority Thevalueinthisfieldallowsindexingtobeperformedatdifferent

    frequencies.Thisisachievedbyrunningthegetsite.plscriptagainsta

    subsetofrecords,basedonthevalueofthisfield(asshowninthecron

    tablistingonpage33).TheAutoWebautoinstallercreatesthree

    Priorityoptionstochoosefrom.Chooseyourprioritydependingon

    howoftenyouwantthesitetobeindexedandupdated.

    35

  • 8/12/2019 AutoWeb_Guide_2.0a

    36/45

    Memex Technology Ltd A Guide to AutoWeb

    Field MP Field MA Details

    Thefrequencyofupdatesisdefinedasfollows:

    HIGHprioritysitesareindexedeveryhour

    MEDIUMprioritysitesareindexedeveryday

    LOWprioritysitesareindexedeveryweek

    Note:ThesefrequenciesaredefinedintheMemexadministratorusers

    crontab.SeeMonitoringWebsitesonpage33formoredetails.

    Options Crawler

    Options

    UsethisfieldtopassspecificoptionstotheHTTrackWebsiteCopier

    software.HTTrackisathirdpartytoolusedbyAutoWebtocopyWeb

    pages.Byspecifyingoptionsyoucanoverrulemanyaspectsof

    AutoWebsdefaultbehaviour.

    ForfulldetailsofthemanyoptionsforHTTrackseetheonlineUsers

    Guideat:

    http://www.httrack.com/html/fcguide.html

    Theoptionthatyouaremostlikelytowanttospecifyisthelinkdepth.

    AutoWebsdefaultlinkdepthis2.Thismeansthatyouwillindexall

    thepagesthatarelinkedtofromthespecifiedstartpage(e.g.thehome

    pageofaWebsite)plusallthepagesthatarelinkedtofromthose,

    primarylink,pages.OnalargeWebsite,withpagesthateachcontain

    manylinks,alinkdepthof2couldresultinhundredsofpagesbeing

    indexed,andyoumay,therefore,wanttoreducethelinkdepth.Ona

    smallWebsite,however,youmightwanttoincreasethelinkdepthto3

    or4.

    Theoptionforsettinglinkdepthis:

    -%eN

    WhereNisanintegertypicallybetween0and4.

    Notes:

    Youmustbeextremelycarefulwhenspecifyingoptions.Ifyouenter

    invalidoptions,orthewrongoptionforthebehaviouryou

    intended,itcanresultinnothingbeingindexed,unexpected

    indexingresults,oreverythingontheentiredomainbeingindexed.

    Ifyoudonotsetavaluehere,thelinkdepthdefaultsto2.

    SettingahighlinkdepthvalueforalargeWebsitecanquickly

    resultinyouusingupagreatdealofavailablediskspace.

    Bydefault,AutoWebdoesnotindexpagesthatarelocatedoutside

    thedomainonwhichthestartpageislocated.Thishelpstorestrict

    indexingtoasingleWebsite.Youcanbypassthisrestrictionby

    usingthe-eoption.However,youshouldusethisoptionwith

    extremecautionasitcaneasilyresultinyouindexingavastnumber

    ofpagesfromtheinternetatlarge.

    36

    http://www.httrack.com/html/fcguide.htmlhttp://www.httrack.com/html/fcguide.html
  • 8/12/2019 AutoWeb_Guide_2.0a

    37/45

    Memex Technology Ltd A Guide to AutoWeb

    Field MP Field MA Details

    Linkdepth,bydefault,onlyextendstopagesonorbelowthe

    currentdirectorylevel.Forexample,ifyouindex

    http://www.memex.co.uk/AboutMemex/index.phpwithalink

    depthof

    2,

    AutoWeb

    will

    index

    pages

    such

    as

    http://www.memex.co.uk/AboutMemex/Awards/index.php,asthis

    pageislocatedinadirectorybelowthestartpage,butitwillnot

    indexhttp://www.memex.co.uk/index.php,whichisinadirectory

    abovethestartpage.Youcanusethe-BoptiontoallowAutoWeb

    toindexupthedirectorystructureaswellasdownit.

    HTTrackWebsiteCopierisopensource,thirdpartysoftware.

    Memexisnotresponsibleforanyofthecontentonthe

    www.httrack.comWebsite.

    Notes Notes Youcanenteranytextaboutthesiteorthisparticularrecordherefor

    your

    own

    reference.

    How Web site monitoring works

    WebsitemonitoringisaccomplishedbyrunningaPerlscriptcalledgetsite.platregular

    intervals.Thisscriptperformsthefollowingactions:

    1. Decodestheconfigurationdatabase.

    2. ParsestheoutputtodeterminewhichWebsitestomirrorandindex.

    3. CallsHTTrackforeachsitethatshouldbeindexed.

    Note:If getsite.plwasrunwithaspecificprioritysetting(e.g.HIGH),onlyasubsetof

    theconfigurationrecordsmayproducecallstoHTTrack.

    TheHTTrackWebsiteCopierprogramthencreatesamirrorofthesiteinthemirror

    directoryoftheserverinstallation.

    Stopping getsite.plIfyouhavestartedgetsite.plandwanttostopit,youmustmanuallydosobykillingits

    processandanyhttrackprocesses.

    Tokillanygetsite.plandhttrackprocesses:

    1. AsrootortheMemexadministratoruser,openashellconsole.

    2. Typethefollowingcommand:

    ps -eo pid,args|grep autoweb

    Thisliststhecurrentlyrunningprocesseswhosedetailsmentionautoweb.

    37

    http://www.memex.co.uk/AboutMemex/index.phphttp://www.memex.co.uk/AboutMemex/Awards/index.phphttp://www.memex.co.uk/index.phphttp://www.httrack.com/http://www.httrack.com/http://www.memex.co.uk/index.phphttp://www.memex.co.uk/AboutMemex/Awards/index.phphttp://www.memex.co.uk/AboutMemex/index.php
  • 8/12/2019 AutoWeb_Guide_2.0a

    38/45

    Memex Technology Ltd A Guide to AutoWeb

    Forexample:

    1545 grep autoweb3197 /opt/memex/autoweb/bin/httrack -V /opt/memex/autoweb/bin/addtomemex5371 /usr/contrib/perl -I/opt/memex/autoweb/perlmodules /opt/memex/autow5513 sh -c /opt/memex/autoweb/bin/httrack -V '/opt/memex/autoweb/bin/add

    3.

    Usethe

    kill

    command

    with

    the

    relevant

    process

    ID

    number

    to

    stop

    each

    of

    the

    listed

    processes,apartfromtheonementioninggrep,whichsimplyreportsthesearchyou

    ran.

    Forexample:

    kill 3197

    kill 5317

    kill 5513

    Extracting the Web page text

    ForeachpagethatHTTrackdownloads,itcallstheaddtomemex.plscript.

    addtomemex.plcheckswhattypeoffilehasbeendownloadedandwhetherthetextcanbe

    extractedfromthefile.ItthenusestheLynxtextbasedWebpagebrowsertooutputatext

    onlyversionofthepage,fromwhichitextractsthetext.

    Whenthetexthasbeenextractedsuccessfully,addtomemex.plwritesarecordtothe

    specifiedWebarchivedatabasecontainingthefollowinginformation:

    Thekeywordsfromtheconfigrecord

    TheoriginalURLofthefile

    ThemirroredURLofthefile

    Thetextfromthepage

    Thedateandtimethepagewasmirrored

    38

  • 8/12/2019 AutoWeb_Guide_2.0a

    39/45

    Appendix AKnown limitations

    AutoWebcontainsthefollowinglimitations:

    Ifapagecontainsanycrossdomainframes,theindexselectionandindexpagebuttons

    willnotwork.Formoreinformation,seetheMicrosoftwebsite:ht t p: / / msdn. mi crosof t . com/ l i br ar y/ def aul t . asp?ur l =/ wor kshop/ aut hor / om/ xf r ame_scr i pt i ng_secur i t y. asp

    AutoWeb

    will

    not

    index

    URLs

    that

    are

    redirected.

    For

    example,

    if

    you

    are

    in

    the

    UK

    andyoubrowsetowww. memex. comyouareredirectedtowww. memex. co. uk.Asa

    resultyoucannotuseAutoWebtoindexht t p: / / www. memex. com.Theworkaround

    istoindexaspecificpagebelowtheredirecteddomainforexample,ht t p: / / www. memex. com/ About Memex/

    Ifauserattemptstoindexapagethathascrossdomainframes,thefollowingerror

    messageisdisplayed:Br owser secur i t y r est r i ct i ons pr event you f r om i ndexi ng t hi spage

    WhenAutoWebmirrorsaWebpageitdoesnotautomaticallymirrordocumentslinked

    tofromthatpage.Thedepthofmirroringdependsontheoptionsspecifiedinthe

    configuration

    record.

    As

    a

    consequence,

    style

    sheets

    used

    by

    the

    page,

    or

    images

    that

    appearonthepage,maynotbemirrored.

    MemexstronglyadvisesthatyouchangetheInternetsecurityzoneofthemirrorto

    disablescripting.AsfilescopiedtothelocalmirrorareonyourlocalIntranet,theymay

    havemoresecurityrightsthanishealthy.SelectTools>InternetOptions>Security>

    RestrictedSites,clickSites,andaddyourmirrordomaintothelist.

    WhenindexingWebsitesusingget si t e. pl ,imagesarenottimestamped.This

    meansthatifaWebpagecontainsanimagethatchanges(butkeepsthesamename),

    theoldcopyoftheimagewillbeoverwritten.Asaresult,theearlierversionofthepage

    willreferencethenewerversionoftheimage.

    39

  • 8/12/2019 AutoWeb_Guide_2.0a

    40/45

    Appendix BTroubleshooting

    IfaWebsiteisnotindexedorisnotindexedinthewayyouexpected:

    ChecktheknownlimitationslistedinAppendixA.

    MakesureyouareawareofthedefaultindexingbehaviourofAutoWebandthe

    variousHTTrackoptions.

    Seepage

    41

    for

    alist

    of

    the

    default

    options

    and

    the

    online

    User

    Guide

    for

    HTTrack

    WebsiteCopierathttp://www.httrack.com/html/fcguide.htmlforacompletelistof

    availableoptions.

    Checkthemessagesinthelogfile.Thepathandnameofthisfilearegivenasthevalue

    ofthel ogf i l eparameterinthespider.cfgconfigurationfile

    (/opt/memex/autoweb/spider.cfg).

    Forexample:/opt/memex/logs/crawlerlog.txt

    IfyougetthemessageAlreadyRunningResourcetemporarilyunavailablewhenyou

    runthegetsite.plscript,itindicatesthatthescripthasnotfinishedindexingpages.This

    maybebecausetheconfigurationrecordsarecausingittoindexmorepagesthanyou

    hadexpected,orrequire.Ifthishappensyoushouldeitherwaitforthescriptto

    complete,orkilltheprocess(asdescribedonpage37),andthenchecktheconfiguration

    recordsbeforerunninggetsite.plagain.

    Ifthegetsite.plscriptrunsmorefrequentlythanexpected,checktheentriesinthecron

    tabfortheMemexadministrator.TheautoinstallerforAutoWebaddscronjobsfor

    getsite.pltothecrontaboftheMemexadministratoruser.Iftheautoinstallerwasrun

    morethanonce,thecrontabwillcontainduplicatecronjobs,whichmustberemoved

    byeditingthecrontab.

    40

    http://www.httrack.com/html/fcguide.htmlhttp://www.httrack.com/html/fcguide.html
  • 8/12/2019 AutoWeb_Guide_2.0a

    41/45

    Appendix CHTTrack options

    HTTrackWebsiteCopierisopensourcesoftwarethatisusedtomirrorWebpages.Memex

    hasalteredthesoftwareslightlyforusewithAutoWeb.

    Note For more information about HTTrack, visit: http://www.httrack.com/andhttp://www.httrack.com/html/fcguide.html .

    ThistableliststheoptionsthatAutoWebusesbydefault.

    Option Description

    -n GetnonHTMLfiles near anHTMLfile

    -%e2 Setstheexternallinkdepthto2

    -A32000 Setsthemaximumtransferrateinbytes/seconds

    -I0 Dontmakeanindexpage

    -Qq Nologandnoquestions

    --assumecfm=text/html,php=text/html

    Assumethatatype(cfm,php)isalwayslinkedwitha

    mimetype

    -X0 Donotpurgeoldfilesafterupdate

    -%F "" DonotputafooterintotheHTMLpages

    -%P0 Donotdoextendedparsing

    -C0 Donotuseacache

    -%Q Donotfollowanyhyperlinksfromthepage

    ThisoptionhasbeenaddedtoHTTrackbyMemex

    -d Stayonthesameprincipaldomain

    Thistablelistsotheroptionsthatyoucanuse,ifnecessary.Touseeitheroption,addittothe

    optsparameterinthespider.cfgfile.Ifnooptionisset,thedefaultbehaviourisfollowthe

    rulesinrobots.txt.SeeSettinguptheAutoWebconfigurationfileonpage15.

    41

    http://www.httrack.com/http://www.httrack.com/html/fcguide.htmlhttp://www.httrack.com/html/fcguide.htmlhttp://www.httrack.com/
  • 8/12/2019 AutoWeb_Guide_2.0a

    42/45

    Memex Technology Ltd A Guide to AutoWeb

    Option Description

    -s0 WhenretrievingWebpages,donotfollowtherulesspecifiedinrobots.txt

    ontheremotewebserver.

    -s2

    Follow

    all

    of

    the

    robots.txt

    rules

    with

    the

    exception

    of

    Disallow:

    /

    as

    this

    willpreventthesoftwarefromretrievinganypagesfromaWebsite.

    42

  • 8/12/2019 AutoWeb_Guide_2.0a

    43/45

    Appendix D

    Upgrading to AutoWeb 1.3

    IfyouarecurrentlyusingAutoWeb1.0or1.1youmustupgradetoversion1.3beforeyoucan

    upgradetoversion2.0.Onceyouhavea1.3systemyoucanupgradeto2.0byfollowingthe

    instructionsonpage18.

    UpgradingfromAutoWeb1.0or1.1toAutoWeb1.3isatwostageprocess.First,youmust

    backupyourpreviousAutoWebsetup;thenyouneedtoinstallAutoWeb1.3.

    Important You will need the installation package for version 1.3 of AutoWeb to completethis procedure.

    Backing up your previous AutoWeb setupBeforebeginningtheupgrade,youshouldbackupyourexistingAutoWebconfigurationand

    databases.Iftheupgradeprocessencountersanyproblems,youcanthenreverttoyour

    known,validsetup.

    Afterthebackupiscomplete,shutdowntheexistingMIEandmovetheAutoWebdirectories

    aside.Forexample,ifyouinstalledyourpreviousversionofAutoWebin

    /opt/memex/autowebyoushouldmovethiswholedirectoryto/opt/memex/autowebold.

    Installing AutoWeb 1.3AfterbackingupyourexistingAutoWebsetup,youmustperformanew,cleaninstallationof

    AutoWeb1.3.

    Note You must install AutoWeb into the same directory as your previous version. Forexample:/opt/memex/autoweb.

    IfyouareusingthisproductwithMemexPatriarch,itisessentialthatyoureadChapter3

    UsingAutoWebwithMemexPatriarchonpage24.Youmustperformallthestepsdetailedthere

    beforeyouproceedwiththeconversion.

    43

  • 8/12/2019 AutoWeb_Guide_2.0a

    44/45

    Memex Technology Ltd A Guide to AutoWeb

    Converting your AutoWeb dataAfterinstallingAutoWeb1.3,youmustrunaconversionscripttoconvertthedatafromyour

    previoussetupandcreateanynewdatabasesthatmayberequired.

    Setting up the conversion script

    Theconversionscriptsreadsaconfigurationfileconvert.confwhichisstoredinthebin

    directoryofthenewAutoWebinstallation.ThisfilespecifiesthedetailsoftheAutoWeb

    databasesthatwillbeconverted.

    Beforerunningtheconversionscript,youmustsetthefollowingoptionstoreflectyour

    AutoWebsetup:

    Option Details

    MIEDecodeDir ThepathtotheMIEinstallationusedbythepreviousversionof

    AutoWeb

    MIEDir ThepathtothenewMIEinstallation

    MIEPort ThenetworkportthatthenewMIEislisteningon

    OldAutoWeb ThepathtothepreviousAutoWebsetup

    NewAutoWeb ThepathtothenewAutoWebinstallation

    IMBase TheinstallationdirectoryforMemexPatriarch(ifinstalled)

    TempDir Adirectorytouseforstoringtemporaryfiles

    Verbosity HowdetailedtheAutoWeboutputwillbe:

    0basicoutput

    1tracksprocesseddatabases

    2detailedoutput

    Running the conversion script

    Afterspecifyingtheconversionoptions,youcanruntheconversionscript.

    Toruntheconversionscript:

    1. MovetothebindirectoryofyournewAutoWebinstallation.

    2. Runthefollowingcommand:

    perl aw-convert.pl

    ThescriptconvertsallthedatafromyourpreviousAutoWebsetupandcreatesanynew

    databasesthatareneededtomatchyourprevioussetup.

    44

  • 8/12/2019 AutoWeb_Guide_2.0a

    45/45

    Memex Technology Ltd A Guide to AutoWeb

    Note After running the conversion script you still need to open the spider.cfgfile in thenew AutoWeb installation directory and make sure the options are configuredcorrectly.