Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
PivotalGreenplum®Text
Version2.3.1
Documentation
Rev:02
©2018PivotalSoftware,Inc.
2347121418252830323343576674117138139141
TableofContents
TableofContentsPivotal®Greenplum®Text2.3.1DocumentationPivotal®GPText2.3.1ReleaseNotesInstallingGPTextUpgradingGPTextIntroductiontoPivotalGPTextAdministeringGPTextGPTextHighAvailabilityGPTextBestPracticesTroubleshootingHadoopConnectionProblemsUsingPivotalGPTextWorkingWithGPTextIndexesQueryingGPTextIndexesCustomizingGPTextIndexesWorkingWithGPTextExternalIndexesGPTextFunctionReferenceGPTextManagementUtilitiesGPTextandSolrDataTypeMappingsGPTextSchemaTablesGPTextConfigurationParameters
©CopyrightPivotalSoftware,Inc,2013-2018 2 2.3.1
Pivotal®Greenplum®Text2.3.1Documentation
GPTextDocumentationPDF
PivotalGPText2.3.1ReleaseNotes
InstallingPivotalGPText
UpgradingPivotalGPText
UsingPivotalGPText
GPTextReferences
AdditionalResourcesPivotalGreenplumDatabase
ApacheSolrWebSite
ApacheMADlib
©CopyrightPivotalSoftware,Inc,2013-2018 3 2.3.1
http://docs-gptext-staging.cfapps.io/archives/GPText-docs-231.pdfhttp://docs-gptext-staging.cfapps.io/230/topics/http://docs-gptext-staging.cfapps.io/230/topics/FuncRef_preface.htmlhttp://gpdb.docs.pivotal.iohttp://lucene.apache.org/solr/http://madlib.apache.org/
Pivotal®GPText2.3.1ReleaseNotesThisdocumentcontainsreleaseinformationforPivotalGPText2.3.1
Released:May2018
AboutPivotalGPTextPivotalGPTextjoinstheGreenplumDatabasemassivelyparallel-processingdatabaseserverwithApacheSolrCloudenterprisesearchandtheApacheMADlibAnalyticsLibrarytoprovidelarge-scaleanalyticsprocessingandbusinessdecisionsupport.GPTextincludesfreetextsearchaswellassupportfortextanalysis.
GPTextincludesthefollowingfeatures:
TheGPTextdatabaseschemaprovidesin-databaseaccesstoApacheSolrindexingandsearching
BuildindexeswithdatabasedataorexternaldocumentsandsearchwiththeGPTextAPI
Customtokenizersforinternationaltextandsocialmediatext
AUniversalQueryProcessorthatacceptsquerieswithmixedsyntaxfromsupportedSolrqueryprocessors
Facetedsearchresults
Termhighlightinginresults
Greateremphasisonhighavailability
TheGPTextmanagementutilitysuiteincludescommand-lineutilitiestoperformthefollowingtasks:
Start,stop,andmonitorZooKeeperandGPTextnodes
ConfigureGPTextnodesandindexes
Addanddeletereplicasforindexshards
BackupandrestoreGPTextindexes
RecoveraGPTextnode
ExpandtheGPTextclusterbyaddingGPTextnodes
PrerequisitesInstallingGPTextalsoinstallsApacheSolrCloudand,optionally,ApacheZooKeeper.
FollowingareGPTextinstallationprerequisites.
GPTextrunsonRedHatEnterpriseLinux5.x,6.x,and7.x.
InstallandconfigureyourGreenplumDatabasesystem,version4.3.6orhigher.SeethePivotalGreenplumDatabaseInstallationGuideathttps://gpdb.docs.pivotal.io .
InstallJavaJRE1.8.xandaddthe bin directorytothe PATH onallhostsinthecluster.GPTextistestedwithOracleJava1.8andOpenJDK1.8.
Ensurethat nc (netcat)isinstalledonallGreenplumclusterhosts( sudo yum install nc ).
Installing lsof onallclusterhostsisrecommended( sudo yum install lsof ).
GPTextcannotbeinstalledontoasharedNFSmount.
GPTextnodescanbeinstalledontheGreenplumDatabaseclusterhostsalongsidetheGreenplumsegmentsoronadditional,non-databasehostsaccessibleontheGreenplumclusternetwork.AllhostsparticipatingintheGPTextsystemmusthavethesameoperatingsystemandconfigurationandhavepasswordless-sshaccessforthegpadminuser.SeethePivotalGreenplumDatabaseInstallationGuideforinstructionstoconfigurehosts.
IfyouplantoplaceGPTextnodesontheGreenplumDatabasesegmenthosts,ensurethatyoureservememoryforGPTextusewhenyouconfigureGreenplumDatabase.TodeterminethememorytosetasideforGPText,multiplythenumberofGPTextnodestocreateoneachGreenplumsegmenthostbytheJVMmaximumsize.SubtractthismemoryfromthephysicalRAMwhencalculatingthevaluefortheGreenplumDatabasegp_vmem_protect_limit serverconfigurationparameter.SeetheGreenplumDatabaseserverconfigurationparameter gp_vmem_protect_limit in
theGreenplumDatabaseReferenceGuideforrecommendedmemorycalculationformulasorvisittheGPDBVirtualMemoryCalculator website.
ApacheSolrrequiresaZooKeeperclusterwithatminimumthreenodes(fivenodesrecommended).Youcaninstalla“binding”ZooKeeperclusterwithGPTextontheGreenplumclusterhosts,oryoucanuseanexistingZooKeepercluster.WhendeployedalongsideGreenplumDatabasesegments,ZooKeeperperformancecanbeaffectedunderheavydatabaseload.Forbestperformance,installaZooKeeperclusteronseparatehostswithnetwork
©CopyrightPivotalSoftware,Inc,2013-2018 4 2.3.1
https://gpdb.docs.pivotal.iohttp://greenplum.org/calc/
connectivitytotheGreenplumnetwork.
NewFeaturesandEnhancementsinGPText2.3.1The gptext-backup command-lineutilitycannowbackupGPTextindexestolocalGPTextclusterstorageaswellasadirectoryonashareddrive.Forlocalbackups,backupmetadataandtheindexconfigurationfilesarebackeduptotheGreenplumDatabasemasterdatadirectoryandindexshardsarebackedupinthesegmentdatadirectoriesoneachhost.
The gptext-backup utilityhasanewoptiontobackupjusttheindexconfigurationfilesfromZooKeeper,withnoindexdata.
The gptext-restore uilityisupdatedtorestorebackupscreatedonlocalclusterstorage.
The gptext-restore utilityhasanewoptiontorestoreonlytheconfigurationfilesfromabackup.ThisoptionloadstheconfigurationfilesintoZooKeeperandcreatesanemptyGPTextindex.
NewFeaturesandEnhancementsinGPText2.3.0
Revisedgptext-configUtilitySyntaxThe gptext-config command-lineutilitywasrevisedtohaveamoreuser-friendlysyntax.
Anew list subcommandwasaddedto gptext-config youcanusetolistalloftheconfigurationfilesforaspecifiedGPTextindex.
$gptext-configlist-i
IndexDocumentsinaHadoopFileSystem(hdfs)DocumentSourceGPText2.3.0enablesyoutoadddocumentsstoredinahdfssystemtoaGPTextexternalindex.
Thenew gptext-external command-lineutilityuploadsHadoopconfigurationandauthenticationfilestoanamedconfigurationinZooKeeper.Theutilityhassubcommands upload , list ,and delete tomanagetheconfigurationsyouhaveuploaded.
Thenew gptext.external_login() functionlogsintothehdfssystemusingthenamedconfigurationyouhaveuploaded.Youcanlogintoonlyoneexternaldocumentsourceatatime.
UseURLsoftheform hdfs:// withthe gptext.index() and gptext.index_external() functionstoadddocumentstoaGPTextexternalindex.
Usethenew gptext.index_external_dir() functiontoaddalldocumentsinanhdfsdirectorytoaGPTextexternalindex.
Logoutofthehdfsexternaldocumentsourcewiththenew gptext.external_logout() function.
SeeAuthenticatingwithanExternalDocumentSourceforstepstoenableaccesstoanhdfsdocumentsource.
KnownIssuesFollowingareknownissuesinGPText.Workaroundsareprovidedwhenavailable.
WildcardsinGPTextSearchOptionsSolrdoesnotreturnallfieldswhenthe fl Solrsearchoptioncontainsawildcardthatmatchesfieldnames.Forexample,givenatablewithcolumnscontenta and contentb ,specifying fl=contenta,contentb,(sum,1,1) correctlyreturnsthreefields.Specifying fl=cont*,sum(1,1) correctlyreturns contenta andcontentb ,butomitsthepseudo-field sum(1,1) .
Specifyingawildcardtomatchallfields( fl=*,sum(1,1) )alsoomitsthepseudo-field.
IndexLoadFailureAfterConfigurationFileError
©CopyrightPivotalSoftware,Inc,2013-2018 5 2.3.1
IfSolrfailstoloadanindexbecauseofaconfigurationfileerror,andthentheindexisdroppedwithoutfirstcorrectingtheconfigurationfileerror,theindexcannotberecreateduntilGPTextisrestarted.Thiscanhappenifyouedit managed-schema or solrconfig.xml andintroduceanXMLsyntaxerrororatypoinconfigurationvalues.
Workaround:
1. Whenanindexfailstoload,checktheSolrlogtofindthecause.
2. Ifthecauseisaconfigurationfileerror,suchasinvalidXML,usethe gptext-config utilitytoeditthefileandfixtheerror.Droppingtheindexwithoutfirstcorrectingtheerrorisnotrecommended.
3. Ifyouhavedroppedanindexthatfailedtoloadwithoutfirstcorrectingthecauseofthefailure,youmustrestartGPTextbeforeyoucanrecreatetheindex.Run gptext-start -r torestartGPText.
StartupFailurewithLargeNumbersofIndexesWhenthereisalargenumberofSolrcores,SolrCloudcanfailtorestartsuccessfully,witherrormessagesindicatingfailuretoelectleadersforshards.ThisisaknownSolrissue;seehttps://issues.apache.org/jira/browse/SOLR-5990 intheApacheSolrJiraforanexample.Becauseofthisissue,itisrecommendedtoavoiddesigningGPTextapplicationsthatcreatelargenumbersofindexes,shards,andreplicas.Thenumberofcoresyoucancreatebeforeyouobservethisbehaviorishardwaredependent,soyoushouldtesttodetermineyoursystem’slimits.Youcancreateandsuccessfullyoperatealargernumbersofindexesthancanberestartedsuccessfullylater,sobesuretotestrestartingGPTexttodetermineapracticallimit.
SettingGPTextConfigurationParametersWithoutFirstSettingcustom_variable_classesIfthe custom_variable_classes GreenplumDatabaseserverconfigurationparameterdoesnotincludethevalue“gptext”,attemptingtosetaGPTextconfigurationparameterreturnsanerrormessage,forexample:
mydb-#setgptext.replication_factor=4;WARNING:PleaselogonagaintomakeGUCsettingtakeeffect.(GucValue.h:301)WARNING:PleaselogonagaintomakeGUCsettingtakeeffect.(GucValue.h:301)ERROR:unrecognizedconfigurationparameter"gptext.replication_factor"
InGPText2.0,inadditiontotheerrormessage,thevalueoftheconfigurationparameterpersistedinZooKeeperiszero,replacingthepreviousvalueoftheparameter.
mydb-#showgptext.replication_factor;gptext.replication_factor----------------------------0
BeginningwithGPText2.1,theerrormessageisstillgenerated,howeverthevaluesavedinZooKeeperisthevaluespecifiedinthe set command,4intheprecedingexample.
Topreventtheerrormessage,beforesettinganyGPTextconfigurationparameters,usethe gpconfig command-lineutilitytosetthe custom_variable_classesconfigurationparameter:
$gpconfig-ccustom_variable_classes-v'gptext'
©CopyrightPivotalSoftware,Inc,2013-2018 6 2.3.1
https://issues.apache.org/jira/browse/SOLR-5990
InstallingGPText
PrerequisitesTheGPTextinstallationincludestheinstallationofApacheSolrCloudand,optionally,ApacheZooKeeper.
IfyouareinstallinganewGPTextreleaseintoanexistingGPTextsystem,followtheinstructionsinUpgradingGPTextinstead.
FollowingareGPTextinstallationprerequisites.
InstallandconfigureyourGreenplumDatabasesystem,version4.3.6orhigher.SeethePivotalGreenplumDatabaseInstallationGuideathttps://gpdb.docs.pivotal.io .
GPTextrunsonRedHatEnterpriseLinuxorCentOS5.x,6.x,or7.x.
GPTextcannotbeinstalledontoasharedNFSmount.
InstallaJRE1.8.xonallhostsinthecluster.
Ensurethat nc (netcat)isinstalledonallGreenplumclusterhosts( yum install nc ).
Installing lsof onallclusterhostsisrecommended( sudo yum install lsof ).
GPTextnodescanbeinstalledontheGreenplumDatabaseclusterhostsalongsidetheGreenplumsegmentsoronadditional,non-databasehostsaccessibleontheGreenplumclusternetwork.AllhostsparticipatingintheGPTextsystemmusthavethesameoperatingsystemandconfigurationandhavepasswordless-sshaccessforthegpadminuser.SeethePivotalGreenplumDatabaseInstallationGuideforinstructionstoconfigurehosts.
IfyouplantoplaceGPTextnodesontheGreenplumDatabasesegmenthosts,ensurethatyoureservememoryforGPTextusewhenyouconfigureGreenplumDatabase.TodeterminethememorytosetasideforGPText,multiplythenumberofGPTextnodestocreateoneachGreenplumsegmenthostbytheJVMmaximumsize.SubtractthismemoryfromthephysicalRAMwhencalculatingthevaluefortheGreenplumDatabasegp_vmem_protect_limit serverconfigurationparameter.SeetheGreenplumDatabaseserverconfigurationparameter gp_vmem_protect_limit in
theGreenplumDatabaseReferenceGuideforrecommendedmemorycalculationformulasorvisittheGPDBVirtualMemoryCalculator website.
ApacheSolrrequiresaZooKeeperclusterwithatminimumthreenodes.Youcaninstalla“binding”ZooKeeperclusterwithGPTextontheGreenplumclusterhosts,oryoucanuseanexistingZooKeepercluster.WhendeployedalongsideGreenplumDatabasesegments,ZooKeeperperformancecanbeaffectedunderheavydatabaseload.Forbestperformance,installaZooKeeperclusterwithatleastthreenodes(fivenodesrecommended)onseparatehostswithnetworkconnectivitytotheGreenplumnetwork.
InstalltheGPTextBinaryDistribution1. OntheGreenplummasterhost,extracttheGPTextdistributionfile,acompressedtararchive.Forexample:
cd/home/gpadmintarxvfzgreenplum-text-release-rhel5_x86_64.tar.gz
Thereleasedirectorycontainsaninstallationconfigurationfile, gptext_install_config ,andtheGPTextinstallationbinary,whichhasanamesimilartogreenplum-text--.bin ,forexample, greenplum-text-2.2.0-rhel6_x86_64.bin .
2. Ifnecessary,grantexecutepermissiontotheGPTextbinary.Forexample:
chmod+x/home/gpadmin/greenplum-text-2.1.0-rhel5_x86_64.bin
3. IfyouareinstallingGPTextinadirectorythatisonlyaccessibletoroot,forexample /usr/local ,performthesesteps:
a. Createtheinstallationdirectoryasrootandchangetheownershiptothegpadminuser.b. Toinstalltoadirectorywheretheusermayormaynothavewritepermissions:
Usegpsshtocreateadirectorywiththesamefilepathonallhosts( mdw , smdw ,andthesegmenthosts sdw1 , sdw2 ,andsoon).Forexample:
/usr/local/
Asroot,setthefilepermissionsandowner.Forexample:
#chmod775/usr/local/#chowngpadmin:gpadmin/usr/local/
©CopyrightPivotalSoftware,Inc,2013-2018 7 2.3.1
https://gpdb.docs.pivotal.iohttp://greenplum.org/calc/
4. Editthe gptext_install_config filetosetparametersfortheinstallation.SeeSetInstallationParametersfordetails.
5. RuntheGPTextinstallationbinaryas gpadmin onthemasterserver:
./gptext-.bin-c
6. AcceptthePivotallicenseagreement.
OptionalTwo-PartGPTextInstallationYoucanruntheGPTextinstallationintwopartsbyfollowingthesesteps.
1. PrepareGPTextinstallationdirectoriesasdescribedinsteps1through3inInstalltheGPTextBinaries.
2. RuntheGPTextinstallationbinaryas gpadmin onthemasterserver:
./gptext-.bin-b
Notethatthe -c optionisomitted.
3. SourcetheGPTextenvironmentscriptintheGPTextinstallationdirectory:
source/greenplum-text_path.sh
4. Editthe gptext_install_config filetosetparametersfortheGPTextinstallation.SeeSetInstallationParametersfordetails.
5. DeploytheGPTextclusterwiththefollowingcommand:
gptext-deploy-c
SetInstallationParametersAGPTextconfigurationfilenamed gptext_install_config containsparameterstoconfiguretheGPTextinstallation.Editthefileandsettheparametersasdescribedinthefollowingtable.
Table1.GPTextinstallationparameters
Parameter Description Example
GPTEXT_HOSTS
AnarrayofhostnamesonwhichtoinstallGPText,orusetheconstant "ALLSEGHOSTS" toinstallGPTextonallGreenplumDatabasesegmenthosts.GPTexthostsmustbepasswordlessssh-accessiblebythegpadminuserfromallotherhostsintheGreenplumCluster.
declare -a GPTEXT_HOSTS=(gptext_host1 gptext_host2 gptext_host3)
GPTEXT_HOSTS="ALLSEGHOSTS"
The GPTEXT_HOSTS and DATA_DIRECTORY installationparametersdeterminethenumberofGPTextnodesthataredeployed.
Thenumberofdirectoriesincludedinthe DATA_DIRECTORY arrayisthenumberofGPTextnodesthatarecreatedperhost.
The GPTEXT_HOSTS parameterdeterminesthenumberofhosts.Ifsettotheconstant "ALLSEGHOSTS" thenumberofGPTextnodehostsisthesameasthenumberofGreenplumsegmenthosts.If GPTEXT_HOSTS issettoanarrayofhostnames,thelengthofthearrayisthenumberofGPTextnodehosts.
ThemaximumnumberofGPTextnodesisthenumberofGreenplumDatabaseprimarysegments.ThebestpracticerecommendationistodeployfewerGPTextnodeswithmorememoryratherthantodividethememoryavailabletoGPTextamongthemaximumnumberofGPTextnodesallowed.Forexample,ifthereareeightprimarysegmentsperhostintheGreenplumDatabasecluster,themaximumnumberofGPTextnodesperhostiseight,butyoushouldtestwithtwoorfourGPTextnodesperhost,adjustingthe JAVA_OPTS installationparametertodividethememoryreservedforGPTextamongthem.
©CopyrightPivotalSoftware,Inc,2013-2018 8 2.3.1
DATA_DIRECTORY
AnarrayofdirectorypathswhereGPTextdatadirectoriesaretobecreated.ThenumberofdirectoriesinthearraydeterminesthenumberofGPTextnodesthatwillbecreatedoneachphysicalhost.IfGPTEXT_HOSTS listsmultipleinterfacesperhost,the
GPTextnodesarespreadevenlyacrosstheinterfaceaddresses.
declare -a DATA_DIRECTORY=(/data/primary /data/primary)
JAVA_OPTS
SetstheminimumandmaximummemoryeachSolrCloudJVMcanuse.
JAVA_OPTS="-Xms1024M -Xmx2048M"
GPTEXT_PORT_BASE
GP_MAX_PORT_LIMIT
SetarangeofportnumbersavailabletoGPTextnodes.GPTextfindsunusedportsinthespecifiedrange.
GPTEXT_PORT_BASE=18983GP_MAX_PORT_LIMIT=28983
ZOO_CLUSTER
WhethertodeployaGPTextbindingZooKeeperclusteroruseanexistingZooKeepercluster.Ifsetto"BINDING" theinstallationdeploysaZooKeeper
cluster.TouseanexistingZooKeepercluster,setthisparametertoalistofZooKeepernodesintheformat"host1:port,host2:port,host3:port “.
ZOO_CLUSTER="BINDING"
ZOO_HOSTS
If ZOO_CLUSTER issetto "BINDING" ,thisparameterisanarrayofthehostswheretheZooKeepernodesaretobeinstalled.Thearraymustcontain3,5,or7hostnames,forexampleZOO_HOSTS=(sdw1 sdw2 swd3 sdw4 sdw5) .Ifyouare
usingasinglehostforZooKeeper,specifyitmultipletimes,forexample, ZOO_HOSTS=(sdw1 sdw1 swd1) .
declare -a ZOO_HOSTS=(localhost localhost localhost localhostlocalhost)
ZOO_DATA_DIR
TheZooKeeperdatadirectory,requiredwhenZOO_CLUSTER issetto "BINDING" . ZOO_DATA_DIR="/data/master/"
ZOO_GPTXTNODE
ThenodepathinZooKeeperforGPText.Thisparameterisrequiredwhether ZOO_CLUSTER issetto"BINDING" oralistofhosts.
ZOO_GPTXTNODE="gptext"
ZOO_PORT_BASE
ZOO_MAX_PORT_LIMIT
ArangeofportnumberstousefortheZooKeepercluster.Unusedportsareallocatedfromwithinthisrange.Therangemustcontainatleast4000portnumbers.
ZOO_PORT_BASE=2188ZOO_MAX_PORT_LIMIT=12188
GPTEXT_JAVA_HOME
ThehomedirectoryoftheJavainstallationtorunforZooKeeperandSolrprocesses.Ifnotset,theJREspecifiedinthe PATH and JAVA_HOME environmentvariableswillbeused.
GPTEXT_JAVA_HOME=/usr/java/jdk1.8.0_131
Parameter Description Example
StartingGPTextFirst,makesuretheGPTextcommand-lineutilitiesareinyourpathbysourcingtheGreenplumDatabaseandGPTextenvironmentscripts.ItisimportanttosourcetheGPTextenvironmentscripteachtimeyousourcetheGreenplumDatabasescript.Forexample:
source/usr/local/greenplum-db-/greenplum_path.shsource/usr/local/greenplum-text-/greenplum-text_path.sh
TouseGPTextinadatabase,youmustfirstusethe gptext-installsql managementutilitytoinstalltheGPTextuser-definedfunctionsandotherobjectsinthedatabase:
gptext-installsqldatabase[database2...]
©CopyrightPivotalSoftware,Inc,2013-2018 9 2.3.1
TheGPTextobjectsarecreatedinthe gptext schema.
TheZooKeeperclustermustberunningbeforeyoustartGPText.IfyouinstalledaboundZooKeepercluster,startitwiththe zkManager command-lineutility.
$zkManagerstart
StartGPTextwiththe gptext-start utility.
$gptext-start
ConfigureGreenplumDatabaseGPTextconfigurationparametersaresavedinZooKeeper.Youcan,however,viewandsetGPTextconfigurationparametersinaGreenplumDatabasesessionusingthe SHOW and SET commands.ThisrequiresaddingtheGPTextcustomvariableclasstotheGreenplumDatabase custom_variable_classesconfigurationparameter.
The custom_variable_classes configurationparameterisacomma-separatedlistofclassnames.Itisunsetbydefault.Toseeifanycustomvariableclasseshavealreadybeenconfigured,runthis gpconfig commandatthecommandline.
gpconfig-scustom_variable_classes
Ifnocustomvariableclasseshavebeenset,settheparameterwiththefollowingcommand.
gpconfig-ccustom_variable_classes-v'gptext'[gpadmin@gpsne~]$gpconfig-ccustom_variable_classes-v'gptext'20171029:12:29:11:028199gpconfig:gpsne:gpadmin-[INFO]:-completedsuccessfully
Ifotherclasseshavebeenconfigured,add gptext totheexistinglist,separatedbyacomma.
Run gpstop-u
tohaveGreenplumDatabasereloadtheconfigurationfile.
WhenyouwanttovieworsetGPTextconfigurationparameters,firstexecutethe gptext.version() functiontoloadtheGPTextconfigurationparametersintothesession.
=#SELECTgptext.version();version--------------------------------GreenplumTextAnalytics2.1.2(1row)
=#SHOWgptext.idx_delim;gptext.idx_delim------------------,(1row)
SeeSettingGPTextConfigurationParametersformoreaboutGPTextconfigurationparameters.
UninstallingGPTextTouninstallGPText,runthe gptext-uninstall utility.YoumusthavesuperuserpermissionsonalldatabaseswithGPTextschemastorun gptext-uninstall .
gptext-uninstall runsonlyifthereisatleastonedatabasewithaGPTextschema.
Execute:
gptext-uninstall
©CopyrightPivotalSoftware,Inc,2013-2018 10 2.3.1
©CopyrightPivotalSoftware,Inc,2013-2018 11 2.3.1
UpgradingGPTextUpgradingaGPTextsystemtoanewGPTextreleaseinstallsthenewGPTextsoftwarereleaseonallhostsintheGreenplumclusterandthenupgradestheGPTextsystem.
UpgradingGPTextandGreenplumDatabaseattheSameTimeIfyouareupgradingtonewreleasesofGreenplumDatabaseandGPTextatthesametime,followthesesteps:
1. CompletetheGreenplumDatabaseupgradefirstandensurethedatabaseisoperational.
2. RuntheGPText gptext-migrator utilitytomigrateyourcurrentGPTextsystemtothenewlyupgradedGreenplumDatabasesystem.
3. EnsurethatthecurrentversionofGPTextworkswiththenewGreenplumDatabaseversion.
4. ProceedwiththeGPTextupgrade.
UpgradingaGPTextReleaseUpgradingaGPTextreleaseisatwo-partprocess:installthenewsoftwarereleaseontheGreenplumclusterhostsandthenupgradetheexistingGPTextsystem.TheGPTextinstallerperformsthefirstpart,installingthenewsoftware.The gptext-upgrade utilityperformsthesecondpart,upgradingthecurrentGPTextsystemtothenewversion.
TheGPTextinstallerdetectsanexistingGPTextsystemand,afterinstallingthenewsoftwarerelease,offerstorunthe gptext-upgrade utilityforyou.IfyouchoosetoupgradetheGPTextsystemlater,youcanrunthe gptext-upgrade utilityyourself.
AllupgradetasksareexecutedontheGreenplummasterhostasthe gpadmin user.The gpadmin usermusthavewritepermissioninthedirectorywherethenewGPTextreleaseistobeinstalled, /usr/local/greenplum-text-- bydefault.
TheGreenplumDatabase,ZooKeeper,andGPTextclustersmustberunning.TheprocedurestopsandrestartsGPTextduringtheupgrade.
Followthesesteps:
1. DownloadthenewGPTextreleaseforyourplatformfromPivotalNetwork .
2. Extractthereleasepackage.
$tarxfzgreenplum-text--.tar.gz
3. MakesurethatZooKeeperandGPTextarerunning.
$gptext-state
4. RuntheGPTextinstaller.
$./greenplum-text--.bin
5. TheinstallerpromptsyoutoacceptthePivotallicenseagreementandtochooseandcreatetheinstallationdirectory.
6. Theinstallerverifiestheenvironmenttoensurethatprerequisitesarepresent,suchasPythonandJava.Ifanyproblemsarediscovered,theinstalleroutputsanerrormessageandstops.Correcttheproblemidentifiedbythemessageandruntheinstalleragain.
7. AfterthenewsoftwarehasbeeninstalledontheGreenplumcluster,theinstallerlooksforanexistingGPTextinstallation.IfanexistingGPTextsystemisfound,theinstallerasksifyouwishtoupgradeGPTextdirectly.
Ifyouansweryes,theinstallerrunsthe gptext-upgrade script.The gptext-upgrade utilityvalidatestheenvironmenttoensureitcancompletetheupgrade,thenexecutestheupgradeandrestartstheGPTextsystem.Ifanyproblemsarediscovered, gptext-upgrade outputsamessageandquits.Fixtheindicatedproblemsandrunthegptext-upgradeutility(at /bin/gptext-upgrade )tocompletetheGPTextsystemupgrade.
WhenupgradingGPText,youdonotspecifyaninstallationconfigurationfileasyoudofortheinitialGPTextinstallation.
©CopyrightPivotalSoftware,Inc,2013-2018 12 2.3.1
http://network.pivotal.io
Ifyouanswerno,youmustrunthe gptext-upgrade scriptaftertheinstallercompletes.Seethegptext-upgradeutilityreferenceforinstructions.
Important:Ifyouanswernoorifthe gptext-upgrade quitswithoutupgradingyoursoftware,followthesestepstore-run gptext-upgrade atalatertime:
a. Sourcethe greenplum-text_path.sh scriptintheoldGPTextinstallationdirectory.Forexample:
$ source /usr/local/greenplum-text-/greenplum-text_path.sh
b. Runthe gptext-upgrade commandfromthenewGPTextinstallationdirectory:
$ /usr/local/greenplum-text-/bin/gptext-upgrade
8. Aftertheupgradehascompleted,sourcethe greenplum-text_path.sh inthenewGPTextreleasedirectoryandrun gptext-statehealthcheck toverifytheGPTextsystem:
$source/usr/local/greenplum-text-/greenplum-text_path.sh$gptext-statehealthcheck
©CopyrightPivotalSoftware,Inc,2013-2018 13 2.3.1
IntroductiontoPivotalGPTextPivotalGPTextenablesprocessingmassquantitiesofrawtextdata(suchassocialmediafeedsore-maildatabases)intomission-criticalinformationthatguidesbusinessandprojectdecisions.GPTextjoinstheGreenplumDatabasemassivelyparallel-processingdatabaseserverwithApacheSolrCloudenterprisesearchandtheMADlibAnalyticsLibrarytoprovidelarge-scaleanalyticsprocessingandbusinessdecisionsupport.GPTextincludesfreetextsearchaswellassupportfortextanalysis.GPTextsupportsbusinessdecisionmakingbyoffering:
Multiplekindsofdata:GPTextsupportsbothsemi-structuredandunstructureddatasearches,whichexponentiallyincreasesthekindsofinformationyoucanfind.
Lessschemadependence:GPTextdoesnotrequirestaticschemastosuccessfullylocateinformation;schemascanchangeorbequitesimpleandstillreturntargetedresults.
Textanalytics:GPTextsupportsanalysisoftextdatawithmachinelearningalgorithms.TheMADlibanalyticslibraryisintegratedwithGreenplumDatabaseandisavailableforusewithGPText.
Thischaptercontainsthefollowingtopics:
GPTextSystemArchitecture
GPTextSampleUseCase
GPTextWorkflow
TextAnalysis
GPTextSystemArchitectureGPTextcombinesaGreenplumDatabaseclusterwithanApacheSolrCloudcluster.GreenplumDatabasesegmentsandGPTextnodescanbedeployedonthesamehostsorondifferenthostswithnetworkconnectivity.
ThefollowingfigureshowstheprocessarchitectureofthecombinedGreenplumDatabaseandApacheSolrclusters.ThefigureshowsfourclusternodeswithfourGreenplumsegmentsandfourSolrinstancesdeployedoneach.AnApacheZooKeeperservicemanagestheSolrCloudcluster.BecauseZooKeeperismostefficientwithanoddnumberofservers,ZooKeepernodesaredeployedonthreeofthefourhosts.GreenplumDatabaseusersaccessSolrCloudservicesviaGPTextuser-definedfunctionsinstalledinGreenplumdatabasesandcommand-lineutilities.
ThefigureomitstheGreenplummasterhost,secondarymaster,andmirrorsegmentsfortheGreenplumprimarysegments.
©CopyrightPivotalSoftware,Inc,2013-2018 14 2.3.1
TheGreenplumsegments,Solrinstances,andZooKeepernodesmayallbedeployedonseparatehostsonthesamenetwork,dependingonapplicationandperformancerequirements.
ThefollowingsectionsdescribehowGPTextintegratesSolrCloudwithGreenplumDatabaseandhowthetwoclustersworktogethertoprovideparalleltextsearchcapabilitiesinGreenplumDatabaseandmaintainhighavailability.
GreenplumDatabaseClusterAGreenplumDatabaseclusteriscomprisedofthefollowingcomponents:
Amasterdatabaseinstance,executingonadedicatedhost,conventionallynamed mdw .(Notillustrated)
Asecondarymasterinstance,onahostconventionallynamed smdw ,actingasawarmstandbyforthemasterinstance.(Notillustrated)
Anarrayofdatabaseprimarysegmentinstancesandmirrorsdeployedonsegmenthosts,byconvention sdw1 through sdwn .AsegmentinstanceisanindependentPostgresdatabaseprocessmanagingaportionofthedistributeddata.Eachsegmenthasamirror(notillustrated)onanotherhostintheclustertoprovideuninterruptedserviceincaseofasegmentorsegmenthostfailure.Thenumberofprimarysegmentsperhostisdeterminedbythehardwareconfiguration—thenumberandtypeofprocessorcores,theamountofphysicalRAM,localstoragecapacity,andnetworkcapacity—aswellasavailabilityandperformancerequirements.
TheGreenplummasterinstancecoordinatestheworkofthesegmentinstances.OptimalperformanceofaGreenplumDatabaseclusterrequiresthatallsegmenthostsbeconfiguredidenticallywiththesamenumberofprimaryandmirrorsegmentsoneach,andwiththedatabasedatadistributedevenlyamongthesegmentinstances.Thefullcapacityofthedatabaseclusterisutilizedwheneverysegmenthostperformsanequalamountofwork.
ApacheSolrCloudApacheSolrisaserverprovidingaccesstoApacheLucenefull-textindexes.ApacheSolrCloudisahighlyavailable,faulttolerantclusterofApacheSolrservers.ThetermGPTextclusterisanotherwaytorefertoaSolrCloudclusterdeployedbyGPTextforusewithaGreenplumDatabasesystem.
ASolrCloudclusteriscomprisedofthefollowingcomponents:
AnApacheZooKeeperclustertomanagetheSolrCloudcluster.SolrCloudusesZooKeepertomanageserverconfigurationandtocoordinatethecluster’sactivities.GPTextcaninstallZooKeeperclusterthatisboundtotheGPTextcluster,oritcanshareanexistingZooKeepercluster.IfGPTextinstallstheZooKeepercluster,itcanbemanagedusingGPTextfunctionsandutilities.TheZooKeeperclustercanbedeployedonGreenplumDatabaseclusterhostsor,forbestperformance,onseperatehostsaccessibletotheGreenplumDatabasecluster.
MultipleSolrCloudserverinstancesdeployedontheGreenplumsegmenthostsoronotherhostsonthesamenetwork.EachinstanceisaJVMprocessrunningSolrserver.SolrCloudinstancesuselocalstorage,whichmaybethesamelocalstoragevolumesthatstoreGreenplumDatabasedata.ThenumberofSolrCloudinstancesperhostcanbethesameasthenumberofGreenplumprimarysegmentsperhost,butthisisnotarequirement.ThenumberofinstancestoexecuteperhostisspecifiedduringGPTextinstallation.
GPTextprovidesdocumentindexingandsearchcapabilitiesforGreenplumDatabasebyaddinguser-definedfunctions(UDFs)thataccessSolrAPIsfromwithindatabasequeries.
GPTextUDFsperformthefollowingtasks:
createandmanageGPTextindexes
insertdocumentsintoindexesfromdatabasetablesor,forGPTextexternalindexes,fromdocumentsstoredoutsideofGreenplumDatabase
searchindexes
TherearealsoGPTextUDFsandcommand-lineutilitiestoconfigure,monitor,andmanagetheSolrCloudclusterandtomanagereplicas,SolrCloud’shigh-availabilitymechanism.(Moreonreplicasinthenextsection.)
ParallelisminGPTextIndexingandSearchingSolrClouddistributesdocumentindexesinslicescalledshards.WithGPText,thenumberofshardsforanindexisthesameasthenumberofGreenplumsegments,soeachGreenplumsegmentoperatesonanequalportionoftheindex.EachshardismanagedbyaSolrCloudinstanceandtheshardsaredistributedevenlyamongtheSolrCloudinstances.TheSolrCloudinstanceandGreenplumsegmentarenotrequiredtobeonthesamehost.
HighAvailabilityforGPTextIndexesSolrCloudprovideshighavailabilitybymaintainingreplicasofshardsandprovidingautomaticfailoverifashardfailsorbecomesunavailable.Onereplica
©CopyrightPivotalSoftware,Inc,2013-2018 15 2.3.1
ofeachshardistheleadreplicaandanychangestoitareappliedtotheotherreplicas.Thereplicationfactor,whichdeterminesthenumberofreplicastomaintainforeachshard,issetwhentheindexiscreated.ReplicasmayalsobeaddedordroppedlaterusingGPTextUDFsorcommand-lineutilities.
ZooKeeperdeterminesthelocationsofshardreplicasamongtheSolrnodesandhosts.WhenaddingareplicausingaGPTextUDForcommand-lineutility,anewshardcanbeexplicitlyplacedonaSolrCloudinstance.
GPTextSampleUseCaseForensicfinancialanalystsneedtolocatecommunicationsamongcorporateexecutivesthatpointtofinancialmalfeasanceintheirfirm.Theanalystsusethefollowingworkflow:
1. LoadtheemailrecordsintoaGreenplumdatabase.
2. CreateaSolrindexoftheemailrecords.
3. Runqueriesthatlookfortextstringsandtheirauthors.
4. Refinethequeriesuntiltheypairadummycompanynamewithtopthreeorfourexecutivescorrespondingaboutsuspectoffshorefinancialtransactions.Withthisdata,theanalystscanfocustheinvestigationonspecificindividualsratherthanthethousandsofauthorsintheinitialdatasample.
GPTextWorkflowGPTextworkswithGreenplumDatabaseandApacheSolrCloudtostoreandindexbigdataforinformationretrieval(query)purposes.High-levelworkflowsincludedataloadingandindexing,anddataquerying.
Thistopicdescribesthefollowinginformation:
DataLoadingandIndexingWorkflow
QueryingDataWorkflow
DataLoadingandIndexingWorkflowThefollowingdiagramshowstheGPTextworkflowforloadingandindexingdata.
AllclientinteractionwiththesystemisthroughtheGreenplummasterinstance.
1. LoaddataintoyourGreenplumDatabasesystem.Createadatabasetabletoholddataandthenaddthedatatothetable.Greenplumprovidesparalleldataloadingutilitiesandprotocolsthathelptotransformandloadexternaldatainvariousformatsandfromvarioussources.Fordetails,seetheGreenplumDatabaseAdministratorGuide,athttp://gpdb.docs.pivotal.io .
©CopyrightPivotalSoftware,Inc,2013-2018 16 2.3.1
http://gpdb.docs.pivotal.io
2. CreateanemptyGPTextindex.Usethe gptext.create_index() user-definedfunction(UDF)tocreateanemptyGPTextindexforthetable.EachGreenplumsegmentwillmanageasliceoftheindex,calledashard.SolrCloudcreatesmultiplereplicasforeachshard,distributedamongtheSolrinstances,andchoosesaleadreplicafortheGreenplumsegmenttooperateupon.Solrmanagesreplicationbetweenthereplicas.
3. Populatetheindexwithdatafromthedatabasetable.Usethe gptext.index() UDFtoadddatatotheindex.ThisUDFworksbydispatchingaSQLquerytoexecuteoneachGreenplumsegment.ThesegmentsexecutethequeryandaddtheresultstotheirshardsusingSolrAPIs.
4. Commitchangestotheindex.CommitchangestotheGPTextindexbycallingthe gptext.commit_index() UDF.Untilthechangesarecommitted,queriesexecutedontheindexcannotaccessanydataaddedtotheindexwith gptext.index() .Ifneeded,uncommittedchangescanberolledback.SolrCloudreplicateschangescommittedtotheleadreplicatotheshards’non-leadreplicas.
QueryingDataWorkflowThefollowingdiagramshowsthehigh-levelGPTextqueryprocessworkflow:
1. AusersubmitsaSQLquerydesignedtosearchtheindexeddata.AGPTextsearchqueryisaSQL SELECT statementonaGPTextsearchUDFthatcontainsfull-textsearchexpressions.
2. TheGreenplummasterdispatchesthequerytotheGreenplumsegments.
3. Eachsegmentexecutesthequery,usingtheSolrAPItosearchitsindexshard.SolrCloudexecutesthesearchqueryontheleadreplicafortheshard.
4. TheGreenplumsegmentsreturntheresultsofthesearchquerytotheGreenplummaster.
5. TheGreenplummasteraggregatestheresultsfromallsegmentsandreturnsthemtotheclient.
TextAnalysisGPTextenablesanalysisofSolrindexeswithApacheMADlib,anopensourcelibraryforscalablein-databaseanalytics.MADlibprovidesdata-parallelimplementationsofmathematical,statistical,andmachinelearningmethodsforstructuredandunstructureddata.YoucanuseGPTexttoperformavarietyofMADlibanalyses.
LearnmoreaboutApacheMADlibathttp://madlib.apache.org .A gppkg packageforMADlibisavailableonthePivotalnetworkathttp://network.pivotal.io .
©CopyrightPivotalSoftware,Inc,2013-2018 17 2.3.1
http://madlib.apache.orghttp://network.pivotal.io
AdministeringGPTextGPTextadministrationincludessecurityconsiderations,monitoringSolrindexstatistics,managingandmonitoringZooKeeper,andtroubleshooting.
ChangingGPTextServerConfigurationParametersConfigurationparametersusedwithGPTextarebuilt-intoGPTextwithdefaultvalues.YoucanchangethevaluesfortheseparametersbysettingthenewvaluesinaGreenplumDatabasesession.ThenewvaluesarestoredinZooKeeper.GPTextindexesusethevaluesofconfigurationparameterswhentheyarecreated.Changingconfigurationparametersaffectsnewindexes,butdoesnotaffectexistingindexes.
SeeGPTextConfigurationParametersforacompletelistofconfigurationparameters.
Aone-timeGreenplumDatabaseconfigurationchangeisneededforGreenplumDatabasetoallowsettinganddisplayingGPTextconfigurationvariables.Asthe gpadmin user,enterthefollowingcommandsinashell:
$gpconfig-ccustom_variable_classes-v'gptext'$gpstop-u
ThenconnecttoadatabasethatcontainstheGPTextschemaandexecutethe gptext.version() functiontoexposetheGPTextconfigurationvariables:
=#select*fromgptext.version();
ChangethevaluesofGPTextconfigurationvariablesusingthe SET commandinasessionwithadatabasethatcontainstheGPTextschema.Thefollowingexamplesetsvaluesforthreeconfigurationparametersina psql session:
=#setgptext.idx_buffer_size=10485760;SET=#setgptext.idx_delim='|';SET=#setgptext.extension_factor=5;SET
Youcanviewthecurrentvalueofaconfigurationparameterthatyouhavesetusingthe SHOW command:
=#showgptext.idx_delim;gptext.idx_delim------------------|(1row)
SecurityandGPTextIndexesGPTextsecurityisbasedonGreenplumDatabasesecurity.YourprivilegestoexecuteGPTextfunctionsdependonyourprivilegesforthedatabasetablethatisthesourcefortheindex.Forexample,ifyouhaveSELECTprivilegesforatableintheGreenplumDatabasedatabase,thenyouhaveSELECTprivilegesforanindexgeneratedfromthattable.
ExecutingGPTextfunctionsrequiresoneofOWNER,SELECT,INSERT,UPDATE,orDELETEprivileges,dependingonthefunction.TheOWNERisthepersonwhocreatedthetableandhasallprivileges.SeetheGreenplumDatabaseAdministratorGuideforinformationaboutsettingprivileges.
ZooKeeperAdministrationApacheZooKeeperenablescoordinationbetweentheApacheSolrandPivotalGPTextdistributedprocessesthroughasharednamespacethatresemblesafilesystem.InZooKeeper,anode(calledaznode)cancontaindata,likeafile,andcanhavechildznodes,likeadirectory.ZooKeeperreplicatesdatabetweenmultipleinstancesdeployedasaclustertoprovideahighlyavailable,fault-tolerantservice.BothSolrandGPTextstoreconfigurationfilesandsharestatusbywritingdatatoZooKeeperznodes.GPTextstoresinformationinthe /gptext znode.TheconfigurationfilesforaGPTextindexareinthe/gptext/configs/ znode.
ThenumberofZooKeeperinstancesintheclusterdetermineshowmanyZooKeepernodefailurestheclustercantolerateandstillremainactive.Theserviceremainsavailableaslongasaclearmajorityofthenon-failednodesareabletocommunicatewitheachother.Totolerateafailureofnnodesthe
©CopyrightPivotalSoftware,Inc,2013-2018 18 2.3.1
clustermusthave2_n_+1nodes.Aclusteroffivenodes,forexample,cantoleratetwofailednodes.
ZooKeeperisveryfastforreadrequestsbecauseitstoresdatainmemory.IfZooKeeperbeginstoswapmemorytodisk,SolrandGPTextperformancewillsufferandcouldexperiencefailures,soitiscriticaltoallocatesufficientmemorytotheZooKeeperJavaprocesses.ToavoidZooKeeperinstancescompetingwithGreenplumDatabasesegmentsformemory,youshoulddeploytheZooKeeperinstancesandGreenplumDatabasesegmentsondifferenthosts.TheZooKeeperandGreenplumDatabasehostsmustbeonthesamenetworkandaccessiblewithpasswordlessSSHbythegpadminuser.YoucanusetheGreenplumDatabase gpssh-exkeys utilitytoshareSSHkeysbetweenZooKeeperandGreenplumDatabasehosts.
YoumuststarttheZooKeeperclusterbeforeyoustartGPText.WhenyoustartGPText,theSolrnodeseachloadthereplicasforindexestheymanage.Withlargenumbersofindexes,shards,andreplicas,startinguptheclustercangenerateaveryhigh,atypicalloadonZooKeeper.ItcantakealongtimetogetallindexesloadedandsomeZooKeeperrequestsmaytimeoutwaitingforresponses.Usingthe gptext-start--
slow_startoptionstartsindexesoneatatime,
providingamoreorderedstart-upandlimitingthenumberofconcurrentZooKeeperrequests.
TheGPTextcommand-lineutility zkManager canbeusedtomonitortheZooKeepercluster.IftheZooKeeperclusterisboundtoGPText,youcanalsostartandstoptheclusterusing zkManager .
CheckingZooKeeperStatusUsethe zkManager utilityfromthecommandlinetochecktheZooKeeperclusterstatus.Theutilityliststhehosts,ports,latency,andfollower/leadermodeforeachZooKeeperinstance.Ifanodeisdown,itsmodeislistedasDown.
TochecktheZooKeeperclusterstatus,runthe zkManagerstate command.
$zkManagerstate20171016:12:59:47:026338zkManager:gpdb:gpadmin-[INFO]:-Executezookeeperstateprocess.20171016:12:59:47:026338zkManager:gpdb:gpadmin-[INFO]:-Checkzookeeperclusterstate...20171016:12:59:47:026338zkManager:gpdb:gpadmin-[INFO]:-HostportLatencymin/avg/maxMode20171016:12:59:47:026338zkManager:gpdb:gpadmin-[INFO]:-gpdb21890/0/22follower20171016:12:59:47:026338zkManager:gpdb:gpadmin-[INFO]:-gpdb21900/0/29leader20171016:12:59:47:026338zkManager:gpdb:gpadmin-[INFO]:-gpdb21880/0/27follower20171016:12:59:47:026338zkManager:gpdb:gpadmin-[INFO]:-Done.
Inadatabasesession,youcanusethe gptext.zookeeper_hosts() functiontolisttheZooKeeperhosts.
=#SELECT*FROMgptext.zookeeper_hosts();host|port--------+------gpdb51|2188gpdb51|2189gpdb51|2190(3rows)
StartingandStoppingtheZooKeeperClusterIftheZooKeeperclusterwasinstalledbytheGPTextinstaller,the zkManager utilitycanstartorstoptheZooKeepercluster.Tostartthecluster,runthezkManagerstart
command.
$zkManagerstart20171016:16:14:46:017845zkManager:gpdb:gpadmin-[INFO]:-Executezookeeperstartprocess20171016:16:14:46:017845zkManager:gpdb:gpadmin-[INFO]:------------------------------------------------20171016:16:14:46:017845zkManager:gpdb:gpadmin-[INFO]:-StartingZookeeper:20171016:16:14:46:017845zkManager:gpdb:gpadmin-[INFO]:------------------------------------------------20171016:16:14:46:017845zkManager:gpdb:gpadmin-[INFO]:-HostZookeeperDir20171016:16:14:46:017845zkManager:gpdb:gpadmin-[INFO]:-gpdb/data/master/zoo020171016:16:14:46:017845zkManager:gpdb:gpadmin-[INFO]:-gpdb/data/master/zoo120171016:16:14:46:017845zkManager:gpdb:gpadmin-[INFO]:-gpdb/data/master/zoo220171016:16:14:48:017845zkManager:gpdb:gpadmin-[INFO]:-Checkzookeeperclusterstate...20171016:16:14:53:017845zkManager:gpdb:gpadmin-[INFO]:-Done.
TostopZooKeeper,runthe zkManagerstop command.
©CopyrightPivotalSoftware,Inc,2013-2018 19 2.3.1
$zkManagerstop20171016:16:14:08:016499zkManager:gpdb:gpadmin-[INFO]:-Executezookeeperstopprocess.20171016:16:14:08:016499zkManager:gpdb:gpadmin-[INFO]:------------------------------------------------20171016:16:14:08:016499zkManager:gpdb:gpadmin-[INFO]:-StopZookeeper:20171016:16:14:08:016499zkManager:gpdb:gpadmin-[INFO]:------------------------------------------------20171016:16:14:08:016499zkManager:gpdb:gpadmin-[INFO]:-HostZookeeperDir20171016:16:14:08:016499zkManager:gpdb:gpadmin-[INFO]:-gpdb/data/master/zoo020171016:16:14:08:016499zkManager:gpdb:gpadmin-[INFO]:-gpdb/data/master/zoo120171016:16:14:08:016499zkManager:gpdb:gpadmin-[INFO]:-gpdb/data/master/zoo220171016:16:14:09:016499zkManager:gpdb:gpadmin-[INFO]:-Done.
SeethezkManagerreferenceformoreinformation.
CheckingSolrCloudStatusYoucancheckthestatusoftheSolrCloudclusterandindexesbyrunningthe gptext-state utilityfromthecommandline.
TocheckthestateoftheGPTextnodesandeachindex,runthe gptext-state utilitywiththe -D ( --details )option:
gptext-state-D
ThiscommandreportsthestatusoftheGPTextnodesandstatusofeachGPTextindex.
Run gptext-statelist toviewjusttheindexes.
The gptext-statehealthcheck commandcheckstheGPTextconfigurationfiles,theindexstatus,requireddiskspace,userprivileges,andindexanddatabaseconsistency.Bydefault,therequireddiskspacecheckpassesifthereisatleast20%diskfree.Youcansetadifferentdiskfreethresholdusingthe--disk_free option.Forexample:
[gpadmin@gpdb-sandbox~]$gptext-statehealthcheck--disk_free=2520160629:15:45:24:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-ExecutehealthcheckonGPTextcluster!20160629:15:45:24:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-CheckGPTextconfigfiles...20160629:15:45:24:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD20160629:15:45:24:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-CheckGPTextindexstatus...20160629:15:45:25:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD20160629:15:45:25:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Checkingforrequireddiskspace...20160629:15:45:25:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD20160629:15:45:25:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Checkingforrequireduserprivileges...20160629:15:45:25:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD20160629:15:45:25:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Checkingforindexesanddatabaseconsistency...20160629:15:45:27:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD20160629:15:45:27:669652gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Done.
Seethe gptext-state utilityreferenceforadditionaloptions.
RecoveringGPTextNodesUsethe gptext-recover utilitytorecoverdownGPTextnodes,forexampleafterafailedGreenplumDatabasesegmenthostisrecovered.
Withnoarguments,the gptext-recover utilitydiscoversdownGPTextnodesandrestartsthem.
Withthe -f (or --force )option,ifaGPTextnodecannotberestartedandnoshardsaredown,thenodeisdeletedandcreatedagainonthesamehost.Missingreplicasareaddedandthefailednodeandfailedreplicasareremoved.
The -H ( --new_hosts )optionallowsrecreatingdownGPTextnodesonnewhoststhatreplacefailedhosts.ThedownGPTextnodesaredeletedandrecreatedonthenewhosts.Theargumenttothe -H optionisacomma-separatedlistofthenewhoststhataretoreplacethefailedhosts.Thenumberofnewhostsmustmatchthenumberoffailedhosts.Ifshardsaredown,itadvisesreindexing.Ifonlysomereplicasaredown,itrecreatesthereplicasonthenewhostsandupdates gptext.conf .
The -r optionrecoversreplicas,butdoesnotattempttorecoveranydownnodes.
Note:BeforerecoveringGPTextnodesonnewlyaddedhosts,ensurethatthefollowingGPTextprerequisiteshavebeeninstalledonthehost:
Java1.8
Python2.6
©CopyrightPivotalSoftware,Inc,2013-2018 20 2.3.1
TheLinux lsof utility
ViewingSolrIndexStatisticsYoucanviewSolrindexstatisticsbyrunningthe gptext-state utilityfromthecommandline.
TolistallGPTextindexes,enterthefollowingcommandatthecommandline:
gptext-statelist
Acommandlinethatretrievesallstatisticsforanindex:
gptext-state--indexwikipedia.public.articles
Acommandlinethatretrievesthenumberofdocumentsinanindex:
gptext-state--indexwikipedia.public.articles--stats_columns=num_docs
Acommandlinethatretrieves num_docs andtheindex size :
gptext-state--indexwikipedia.public.articles--stats_columnsnum_docs,size
BackingUpandRestoringGPTextIndexesWiththe gptext-backup managementutility,youcanbackupaGPTextindexsothat,ifneeded,youcanquicklyrecoverfromafailure.ThebackupcanberestoredtothesameGPTextsystemortoanothersystemwiththesamenumberofGreenplumDatabasesegments.
The gptext-backup managementutilitybacksupanindexanditsconfigurationfilestoeitherasharedfilesystem,whichmustbemountedonandwritablebyeachhostintheGreenplumDatabasecluster,ortolocalstorageontheGreenplumDatabasemasterandsegmenthosts.
BackingUptoaSharedFileSystemTobackuponasharedfilesystem,usethe -p ( --path )command-lineoptiontospecifythelocationofadirectoryonthemountedfilesystemandthe-n ( --name )optiontoprovideanameforthebackup.Specifytheindextobackupwiththe -i (--index )option.
$gptext-backup-i-p--n
The gptext-backup utilitythenchecksthat:
theGPTextclusterisup
thesharedfilesystemisvalid
thebackupnamespecifiedwiththe -n optiondoesnotalreadyexistinthedirectoryspecifiedwiththe -p option
Theutilitycreatesthenewdirectoryandthensavesonecopyofeachindexshardtothatdirectory,alongwiththeindex’sconfigurationfilesfromZooKeeper.
Tosavetheconfigurationfilesonly,withnodata,addthe -c ( --backup_conf )command-lineoption.
Torestoreanindexfromasharedfilesystem,usethe gptext-restore managementutility.TheGPTextsystemyourestoretomustbeonaGreenplumDatabaseclusterwiththesamenumberofsegments.Thedatabaseandschemafortheindexmustbepresent.
The -i ( --index )optionspecifiesthenameoftheGPTextindexthatwillberestored.Iftheindexexists,youmustfirstdropitwiththe gptext.drop_index()user-definedfunction.
The -p ( --path )optionspecifiesthelocationofthedirectorycontainingthebackupfiles—thedirectorythat gptext-backup createdonthesharedfilesystem.
©CopyrightPivotalSoftware,Inc,2013-2018 21 2.3.1
$gptext-restore-i-p
Youcanaddthe -c optiontorestoreonlytheconfigurationfilestoZooKeeperandcreateanemptyGPTextindex,withoutrestoringanysavedindexdata.
BackingUptoLocalStorageTobackuptolocalstorageontheGreenplumDatabasecluster,addthe local keywordtothe gptext-backup command-line.
AlocalGPTextbackuphasauniquenameconstructedbyappendingatimestamptotheindexname.Youdonotusethe -n optionwithlocalbackups.
$gptext-backuplocal-i
Onthemasterhost,inthemasterdatadirectorybydefault,thebackuputilitysavesaJSONfilewithbackupmetadataandadirectorycontainingtheindex’sconfigurationfilesfromZooKeeper.
TheutilitybacksupeachindexshardontheGreenplumDatabasesegmenthostwiththeGPTextnodethatmanagestheshard’sleadreplica.Bydefault,theshardbackupfilesaresavedinasegmentdatadirectory.
The gptext-backup commandoutputreportsthelocationsofallbackupfiles.
Youcanaddthe -p ( --path )optiontothe gptext-backup commandtospecifyalocaldirectorywherethebackupwillbesaved.ThedirectorymustbepresentoneveryGreenplumDatabasehostandmustbewriteablebythegpadminuser.
$gptext-backuplocal-i-p
ThebackupfileswillbesavedinthespecifieddirectoryoneachhostinsteadofintheGreenplumDatabasemasterandsegmentdatadirectories.
Torestoreabackupsavedtolocalstorage,addthe local keywordtothe gptext-restore command-lineandspecifythepathtothebackupdirectoryonthemasterhost.
$gptext-restorelocal-p
The isthefullpathtothedirectorythe gptext-backup commandcreatedonthemasterhost,includingthetimestamp,forexample$MASTER_DATA_DIRECTORY/demo.twitter.message_2018-05-08T15:32:21.397779 .
Seegptext-backupforsyntaxandexamplesforrunning gptext-backup .Seegptext-restoreforsyntaxandexamplesforrunning gptext-restore .
ExpandingtheGPTextClusterThe gptext-expand managementutilityaddsGPTextnodestothecluster.Therearetwowaystoaddnodes:
AddGPTextnodestoexistinghostsinthecluster.ThisoptionincreasesthenumberofGPTextnodesoneachhost.
AddGPTextnodestonewhostsaddedbyusingtheGreenplumDatabase gpexpand managementutilitytoexpandtheGreenplumDatabasesystem.
AddingGPTextNodestoExistingSegmentHostsToaddnodestoexistingsegmenthosts,runthe gptext-expand utilitywithacommandlikethefollowing:
gptext-expand-e-p/data1/nodes,/data2/nodes
ThisexampleaddstwoGPTextnodestoeachhost.
The -e ( --existing )optionspecifiesthatnodesaretobeaddedtoexistinghosts.
The -p ( --expand_paths )optionprovidesalistofdirectorieswherethenewnodes’datadirectoriesaretobecreated.TheseshouldbethesamedirectoriesthatcontaintheGreenplumDatabasesegmentdatadirectoriesandexistingGPTextdatadirectories.Thenumberofdirectoriesinthelististhenumberofnewnodesthatareadded.
©CopyrightPivotalSoftware,Inc,2013-2018 22 2.3.1
AdirectorycanberepeatedinthedirectorylistmultipletimestoincreasethenumberofnewGPTextnodestocreate.Forexample,ifthereiscurrentlyoneGPTextnodeperhostinthe /data1/nodes directory,youcouldaddthreenodeswithacommandlikethefollowing:
gptext-expand-e-p/data1/nodes,/data2/nodes,/data2/nodes
Thisaddsonenodetothe /data1/nodes directoryandtwonodestothe /data2/nodes directorysotherearetwoGPTextnodesineachdirectory.
AddingGPTextnodesaffectsnewindexes,butnotexistingindexes.Replicasfornewindexeswillbedistributedacrossallofthenodes,includingbotholdnodesandthenewlycreatednodes.Replicasforindexesthatexistedbeforerunning gptext-expand arenotautomaticallymoved.Rebalancingexistingreplicasrequiresreindexing.
AddingGPTextNodestoNewHostsCheckthatthefollowingGPTextprerequisitesareinstalledoneachnewhostaddedtotheGreenplumDatabasecluster:
Java1.8
Python2.6orgreater
Linux lsof utility
NewhostsmustbereachablebyallhostsintheGPTextcluster,includingexistinghostsandthenewhostsyouareadding.
AfterexpandingtheGreenplumDatabaseclusterwiththe gpexpand managementutility,call gptext-expand withthe -H ( --new_hosts )optionandalistofthenewhostsonwhichtoinstallGPText:
gptext-expand-Hnewhost1,newhost2
The gptext-expand utilityinstallsGPTextbinariesonthenewhostsandthencreatesnewGPTextnodesonthenewhosts.
ExpandingaGreenplumDatabaseclusterincreasesthenumberofsegments,sothenumberofGPTextindexshardsforexistingindexesmustbeincreasedtoequalthenewnumberofsegments.Thisrequiresreindexingallexistingdocuments.Newlycreatedindexeswillautomaticallybedistributedamongthenewshards.
TroubleshootingGPTexterrorsareofthefollowingtypes:
Solrerrors
gptext errors
MostoftheSolrerrorsareself-explanatory.
gptext errorsarecausedbymisuseofafunctionorutility.Theyprovideamessagethattellsyouwhenyouhaveusedanincorrectfunctionorargument.
MonitoringLogsYoucanexaminetheGreenplumDatabaseandSolrlogsformoreinformationiferrorsoccur.GreenplumDatabaselogsresidein:
segment-directory/pg-log
Solrlogsresidein:
/solr/logs
DeterminingSegmentStatuswithgptext-stateUsethe gptext-state utilitytodetermineifanyprimaryormirrorsegmentsaredown.See gptext-state intheGPTextManagementUtilitiesReference.
©CopyrightPivotalSoftware,Inc,2013-2018 23 2.3.1
©CopyrightPivotalSoftware,Inc,2013-2018 24 2.3.1
GPTextHighAvailabilityTheGPTexthighavailabilityfeatureensuresthatyoucancontinueworkingwithGPTextindexesaslongaseachshardintheindexhasatleastoneworkingreplica.
AGPTextindexhasoneshardforeachGreenplumsegment,sothereisaone-to-onecoorespondencebetweenGreenplumsegmentsandGPTextindexshards.TheshardmanagedbyaGreenplumsegmentisanindexofthedocumentsthataremanagedbythatsegment.
TheGPTexthighavailabilitymechanismistomaintainmultiplecopies,orreplicas,oftheshard.TheZooKeeperservicethatmanagesSolrCloudchoosesaGPTextinstance(SolrCloudnode)foreachreplicatoensureevendistributionandhighavailability.Foreachshard,onereplicaiselectedleaderandtheGreenplumsegmentassociatedwiththeshardoperatesonthisleaderreplica.TheGPTextinstancemanagingtheleadreplicamayormaynotbeonanotherGreenplumhost,soindexingandsearchingoperationsarepassedovertheGreenplumcluster’sinterconnectnetwork.SolrCloudreplicateschangesmadetotheleaderreplicatotheremainingreplicas.
ThefollowingfigureillustratestherelationshipsbetweenGreenplumsegmentsandGPTextindexshardsandreplicas.Theleaderreplicaforeachshardisshowningreenandthefollowersaregray.
Thenumberofreplicastocreateforeachshard,thereplicationfactor,isaSolrCloudproperty.Bydefault,GPTextstartsSolrCloudwithareplicationfactorofthree.ThereplicationfactorforeachindividualindexisthevalueoftheSolrCloudreplicationfactorwhentheindexiscreated.Changingthereplicationfactordoesnotalterthereplicationfactorforexistingindexes.
GreenplumSegmentorHostFailureIfaGreenplumprimarysegmentfailsanditsmirrorisactivated,GPTextfunctionsandutilitiescontinuetoaccesstheleaderreplica.Nointerventionisneeded.
Ifahostintheclusterfails,bothGreenplumandGPTextareaffected.MirrorsfortheGreenplumprimarysegmentslocatedonthefailedhostareactivatedonotherhosts.SolrCloudelectsanewleaderreplicaforaffectedshards.BecauseGreenplumsegmentmirrorsandGPTextshardreplicasaredistributedthroughoutthecluster,asinglehostfailureshouldnotpreventtheclusterfromcontinuingtooperate.Theperformanceofdatabasequeriesandindexingoperationswillbeaffecteduntilthefailedhostisrecoveredandtheclusterisbroughtbackintobalance.
ZooKeeperClusterAvailabilitySolrCloudisdependentonaworking,availableZooKeepercluster.ForZooKeepertobeactive,amajorityoftheZooKeeperclusternodesmustbeupandabletocommunicatewitheachother.AZooKeeperclusterwiththreenodescancontinuetooperateifoneofthenodesfails,sincetwoisamajorityofthree.Totoleratetwofailednodes,theclustermusthaveatleastfivenodessothatthenumberofworkingnodesremainingafterthefailureareamajority.Totoleratennodefailures,then,aZooKeeperclustermusthave2*n*+1nodes.ThisiswhyZooKeeperclustersusuallyhaveanoddnumberofnodes.
Thebestpracticeforahigh-availabilityGPTextclusterisaZooKeeperclusterwithfiveorsevennodessothattheclustercantoleratetwoorthreefailednodes.
©CopyrightPivotalSoftware,Inc,2013-2018 25 2.3.1
ManagingGPTextClusterHealthGPTextdocumentindexingandsearchingservicesremainavailableaslongaseachshardofanindexhasatleastoneworkingreplica.Toensureavailabilityintheeventofafailure,itisimportanttomonitorthestatusoftheclusterandensurethatalloftheindexshardreplicasarehealthy.YoucanmonitortheSolrCloudclusterandindexesusingtheSolrCloudDashboardorusingGPTextfunctionsandmanagementutilities.AccesstheSolrCloudDashboardwithawebbrowseronanyGPTextinstancewithaURLsuchas http://sdw3:18983/solr .(TheportnumbersforGPTextinstancesaresetwiththeGPTEXT_PORT_BASE parameterintheinstallationparametersfileatinstallationtime.)
RefertotheApacheSolrClouddocumentationforhelpusingtheSolrCloudDashboard.
MonitoringtheClusterwithGPTextTheGPText gptext-state managementutilityallowsyoutoquerythestateoftheGPTextclusterandindexes.Youcanalsouse gptext.index_status() toviewthestatusofallindexesoraspecifiedindex.
ToseetheGPTextclusterstaterunthe gptext-state command-lineutilitywiththe -d optiontospecifyadatabasethathastheGPTextschemainstalled.
gptext-state-dmydb
TheutilityreportsanyGPTextnodesthataredownandliststhestatusofeveryGPTextindex.Foreachindex,thedatabasename,indexname,andstatusarereported.Thestatuscolumncontains“Green”,“Yellow”,or“Red”:-Green–allreplicasforallshardsarehealthy-Yellow–allshardshaveatleastonehealthyreplicabutatleastonereplicaisdown-Red–noreplicasareavailableforatleastoneindexshard
ToseethedistributionofindexshardsandreplicasintheGPTextcluster,executethisSQLstatement.
SELECTindex_name,shard_name,replica_name,node_nameFROMgptext.index_summary()ORDERBYnode_name;
TolistallGPTextindexes,runthe gptext-statelist command.
gptext-statelist-dmydb
The gptext-statehealthcheck commandchecksthehealthofthecluster.The -f flagspecifiesthepercentageofavailablediskspacerequiredtoreportahealthycluster.Thedefaultis10.
gptext-statehealthcheck-f20-dmydb
See gptext-state intheManagementUtilitiesreferenceforhelpwithadditional gptext-state options.
Thegptext.index_status()user-definedfunctionreportsthestatusofallGPTextindexesoraspecifiedindex.
SELECT*FROMgptext.index_status();
Specifyanindexnametoreportonlythestatusofthatindex.
SELECT*FROMgptext.index_status('demo.twitter.message');
AddingandDroppingReplicasThe gptext-replica utilityaddsordropsareplicaofasingleindexshard.Usethe gptext.add_replica() and gptext.delete_replica() user-definedfunctionstoperformthesametasksfromwithinthedatabase.
Ifareplicaofashardfails,use gptext-replica toaddanewreplicaandthendropthefailedreplicatobringtheindexbackto“Green”status.
gptext-replicaadd-imydb.public.messages-sshard3
Hereistheequivalent,usingthe gptext.add_replica() function:
©CopyrightPivotalSoftware,Inc,2013-2018 26 2.3.1
SELECT*FROMgptext.add_replica('mydb.public.messages',shard3);
ZooKeeperdetermineswherethereplicawillbelocated,butyoucanalsospecifythenodewherethereplicaiscreated:
gptext-replicaadd-imydb.public.messages-sshard3-nsdw3
Inthe gptext.add_replica() function,addthenodenameasathirdargument.
Todropareplica,call gptext.delete_replica() withthenameoftheindex,thenameoftheshard,andthenameofthereplica.Youcanfindthenameofthereplicabycalling gptext.index_status(index_name) .Thenameisintheformat core_noden .Anoptional -o flagspecifiesthatthereplicaistobedeletedonlyifitisdown.
gptext-replicadrop-imydb.public.messages-sshard3-rcore_node4-o
Hereistheequivalentoftheabovecommandusingthe gptext.delete_replica() user-definedfunction.
SELECT*FROMgptext.delete_replica('mydb.public.messages','shard3','cord_node4',true);
©CopyrightPivotalSoftware,Inc,2013-2018 27 2.3.1
GPTextBestPracticesEachGPText/ApacheSolrnodeisaJavaVirtualMachine(JVM)processandisallocatedmemoryatstartup.ThemaximumamountofmemorytheJVMwilluseissetwiththe -Xmx parameterontheJavacommandline.Performanceproblemsandoutofmemoryfailurescanoccurwhenthenodeshaveinsufficientmemory.
OtherperformanceproblemscanresultfromresourcecontentionbetweentheGreenplumDatabase,Solr,andZooKeeperclusters.
ThistopicdiscussesGPTextusecasesthatstressSolrJVMmemoryindifferentwaysandthebestpracticesforpreventingoralleviatingperformanceproblemsfrominsufficientJVMmemoryandothercauses.
IndexingLargeNumbersofDocumentsIndexingdocumentsconsumesdatainSolrJVMmemory.Whentheindexiscommitted,partsofthememoryarereleased,butsomedataremainsinmemorytosupportfastsearch.Bydefault,Solrperformsanautomaticsoftcommitwhen1,000,000documentsareindexedor20minutes(1,200,000milliseconds)havepassed.Asoftcommitpushesdocumentsfrommemorytotheindex,freeingJVMmemory.Asoftcommitalsomakesthedocumentsvisibleinsearches.Asoftcommitdoesnot,however,maketheindexupdatesdurable;itisstillnecessarytocommittheindexwiththe gptext.commit()user-definedfunction.
Youcanconfigureanindextoperformamorefrequentautomaticsoftcommitbyeditingthe solrconfig.xml filefortheindex:
$gptext-configedit-fsolrconfig.xml-i..
The elementisachildofthe element.Editthe and valuestoreducethetimebetweenautomaticcommits.Forexample,thefollowingsettingsperformanautocommitevery100,000documentsor10minutes.
100000600000
IndexingVeryLargeDocumentsIndexingverylargedocumentscanusealargeamountofJVMmemory.Tomanagethis,youcansetthe gptext.idx_buffer_size configurationparametertoreducethesizeoftheindexingbuffer.
SeeChangingGPTextServerConfigurationParametersforinstructionstochangeconfigurationparametervalues.
DeterminingtheNumberofGPTextNodestoDeployAGPTextnodeisaSolrinstancemanagedbyGPText.ThenodescanbedeployedontheGreenplumDatabaseclusterhostsoronseparatehostsaccessibletotheGreenplumDatabasecluster.ThenumberofnodesisconfiguredduringGPTextinstallation.
ThemaximumrecommendednumberofGPTextnodesyoucandeployisthenumberofGreenplumDatabaseprimarysegments.However,thebestpracticerecommendationistodeployfewerGPTextnodeswithmorememoryratherthantodividethememoryavailabletoGPTextamongthemaximumnumberofGPTextnodes.Usethe JAVA_OPTS installationparametertosetmemorysizeforGPTextnodes.
AsingleGPTextnodeperhostcaneasilyhandleseveralindexes.EachadditionalnodeconsumesadditionalCPUandmemoryresources,soitisdesirabletolimitthenumberofnodesperhost.FormostGPTextinstallations,asingleGPTextnodeperhostissufficient.
IftheJVMhasaverylargeamountofmemory,however,garbagecollectioncancauselongpauseswhiletheJVMreorganizesmemory.Also,theJVMemploysamemoryaddressoptimizationthatcannotbeusedwhenJVMmemoryexceeds32GB,soatmorethan32GB,aGPTextnodelosescapacityandperformance.Therefore,noGPTextnodeshouldhavemorethan32GBofmemory.
Forexample,ifyouhave48GBmemoryavailableforGPTextperhost,youshoulddeploytwoGPTextnodeswith24GBmemory.Ifyouhave128GBavailable,youshoulddeployatleastfourJVMs,andmoreifgarbagecollectionbecomesaproblem.
©CopyrightPivotalSoftware,Inc,2013-2018 28 2.3.1
ConfigureMaximumJVMHeapSizeEachSolrcorefileconsumesJVMheapmemory.AddingmoreindexesincreasesJVMswappingandgarbagecollectionfrequencysothatittakeslongertocreateindexesandtoloadthecorefileswhenGPTextisstarted.IfyoucontinuetocreateindexeswithoutincreasingtheJVMheap,anoutofmemoryerrorwilleventuallyoccur.
MonitorperformanceatstartupandduringindexcreationandincreasetheJVMsizewhenyoubegintoseedegradedperformance.Youcanalsousetoolssuchasjconsole,includedwiththeJavaDeveloperKit,tomonitorJavaheapusage.Ifgarbagecollectionsareoccurringtoofrequentlyandfreeingtoolittlememory,JVMheapshouldbeincreased.
TheJVMsizeisinitiallyconfiguredduringGPTextinstallationbysettingthe JAVA_OPTIONS parameterintheinstallationconfigurationfile.Afterinstallation,usethe gptext-configjvm commandtoincreasetheJVMheapsize.Forexample,this gptext-configjvm commandsetstheJVMmaximumheapoptionto4GB:
$gptext-configjvm-o"-Xmx=4096M"
ManageIndexingandSearchLoadsWithhighindexingorsearchload,JVMgarbagecollectionpausescancausetheSolroverseerqueuetobackup.ForaheavilyloadedGPTextsystem,youcanpreventsomeperformanceproblemsbyschedulingdocumentindexingfortimeswhensearchactivityislow.
TermsQueriesandOutofMemoryErrorsThe gptext.terms() functionretrievestermsvectorsfromdocumentsthatmatchaquery.Anoutofmemoryerrormayoccurifthedocumentsarelarge,orifthequerymatchesalargenumberofdocumentsoneachnode.Otherfactorscancontributetooutofmemoryerrorswhenrunninga gptext.terms() query,includingthemaximummemoryavailabletotheSolrnodes(-Xmxvaluein JAVA_OPTS )andconcurrentqueries.
Ifyouexperienceoutofmemoryerrorswith gptext.terms() youcansetalowervalueforthe term_batch_size GPTextconfigurationvariable.Thedefaultvalueis1000.Forexample,youcouldtryrunningthefailingquerywith term_batch_size setto500.Loweringthevaluemaypreventoutofmemoryerrors,butperformanceoftermsqueriescanbeaffected.
SeeGPTextConfigurationParametersforhelpsettingGPTextconfigurationparameters.
ConfigureFileSystemCachingforZooKeeperGoodSolrperformanceisdependentonfastresponseforZooKeeperrequests.ZooKeeperperformsbestwhenitsdatabaseiscachedsoitdoesnothavetogotodiskforlookups.IfyoufindthatZooKeeperJVMshavefrequentdiskaccesses,lookforwaystoimprovefilecachingormoveZooKeeperdiskstofasterstorage.
TheZooKeeper zkClientTimeout parameteristhetimeaclientisallowedtonottalktoZooKeeperbeforehavingitssessionexpired.
©CopyrightPivotalSoftware,Inc,2013-2018 29 2.3.1
TroubleshootingHadoopConnectionProblemsThissectiondescribesHadoop-relatedproblemsandpotentialsolutionstotheseissues.
DataNodeAccessErrorsYoumayexperienceHadoopaccesserrorswithGPTextifanyDataNodesintheHadoopclusterresideinamulti-homednetwork.GPTextusesanexternalIPaddresstoaccesstheHDFSNameNode.GPTextencountersanerrorwhentheNameNodeprovidesaninternalIPaddressforaDataNode.Inthissituation,additionalconfigurationisrequiredtoconfigureGPTexttoperformitsownDNSresolutionofDataNodehostnames.
PerformthefollowingproceduretoexplicitlyconfigureDNSresolutionofDataNodehostnames:
1. LocatealocalcopyoftheHadoopauthenticationconfigurationdirectorythatyoupreviouslyuploadedtoZooKeeper.Forexample,ifthedirectoryislocatedat /home/gpadmin/auths/hdfs_conf :
$cd/home/gpadmin/auths/hdfs_conf$lscore-site.xmlhdfs-site.xmluser.txt
2. Open hdfs-site.xml intheeditorofyourchoice.Forexample:
$vihdfs-site.xml
3. Addthefollowingpropertyblocktothefile,andthensavethefileandexit:
dfs.client.use.datanode.hostnametrue
ThispropertyallowsGPTexthoststoperformtheirownDNSresolutionofHDFSDataNodehostnames.
4. Re-uploadthemodifiedconfigurationtoZooKeeper.Forexample,ifthe hdfs_conf directoryincludestheauthenticationconfigurationfilesforaHadoopclusterwith hdfs_bill_auth :
$cd..$gptext-externalupload-thdfs-chdfs_bill_auth-phdfs_conf
5. Determinethehostname-to-IPaddressmappingforallDataNodes,andaddtheassociatedentriesintothe /etc/hosts fileonallGPTextclienthosts.
Kerberos-RelatedErrorsThefollowingproblemsarespecifictoHadoopclusterssecuredwithKerberos.
ClockSkewAloginattempttoaHadoopclustersecuredwithKerberoswillfailifclockskewbetweenGPTextclienthostsandtheKerberosKDChostistoogreat.Inthissituation,youmayseethefollowingerrorintheSolrlog:
java.io.IOException causedbya KrbException noting“Clockskewtoogreat”
Toresolvethissituation,ensurethattheclocksontheKerberosKDChostandGPTextclienthostsaresynchronized.
TimeoutErrorsAloginattempttoaHadoopclustersecuredwithKerberosmayfailwithtimeouterrorswhenthe kdc and admin_server settingsinthe krb5.conf filearespecifiedwithahostname,andtheGPTextclienthostscannotresolvethehostname.Inthissituation,youmayseeoneofthefollowingerrorsintheSolrlog:
©CopyrightPivotalSoftware,Inc,2013-2018 30 2.3.1
org.apache.solr.common.SolrException: Failed to login HDFS messagecausedbya java.io.IOException specifyingjavax.security.auth.login.LoginException: Receive timed out
java.nio.channels.UnresolvedAddressException with SocketIOWithTimeout referencedinthestacktrace
Inthissituation,youmaychooseeitherofthefollowing:
UpdatetheKerberos krb5.conf filetospecifythe kdc and admin_server settingsusingIPaddresses.Or
UpdateallGPTexthoststoperformtheirownDNSresolutionoftheKerberosKDCserver.
Ifyouchoosetoupdatethe krb5.conf file:
1. LocatealocalcopyoftheHadoopKerberosauthenticationconfigurationdirectorythatyoupreviouslyuploadedtoZooKeeper.Forexample,ifthedirectoryislocatedat /home/gpadmin/auths/hdfs_kerb_conf :
$cd/home/gpadmin/auths/hdfs_kerb_conf$lscore-site.xmlhdfs-site.xmlkeytabkrb5.confuser.txt
2. Open krb5.conf intheeditorofyourchoice.Forexample:
$vikrb5.conf
3. Replacethe KERBEROS blockattributeswiththeirequivalentIPaddressesandthensavethefileandexit.Forexample:
[realms]KERBEROS={kdc=admin_server=}
4. Re-uploadthemodifiedconfigurationtoZooKeeper.Forexample,ifthedirectorynamed hdfs_kerb_conf includestheauthenticationconfigurationfilesforaHadoopclusterdefinedwiththe hdfs_kerb_auth :
$cd..$gptext-externalupload-thdfs-chdfs_kerb_auth-phdfs_kerb_conf
Alternately,ifyouchoosetoconfiguretheGPTexthoststoperformtheirownDNSresolutionoftheKerberosKDCserver,addanentryfortheKDChostname-to-IPaddressmappingtothe /etc/hosts fileonallGPTextclienthosts.
©CopyrightPivotalSoftware,Inc,2013-2018 31 2.3.1
UsingPivotalGPTextIntroductiontoPivotalGPText
WorkingWithGPTextIndexes
QueryingGPTextIndexes
CustomingGPTextIndexes
WorkingwithGPTextExternalIndexes
AdministeringGPText
GPTextHighAvailability
GPTextBestPractices
TroubleshootingHadoopConnectionProblems
©CopyrightPivotalSoftware,Inc,2013-2018 32 2.3.1
WorkingWithGPTextIndexesIndexingpreparesdocumentsfortextanalysisandfastqueryprocessing.ThistopicshowsyouhowtocreateGPTextindexesandadddocumentsfromGreenplumDatabasetablestothem,andhowtomaintainandcustomizeindexesforyourownapplications.
ForhelpindexingandsearchingdocumentsstoredoutsideofGreenplumDatabaseseeWorkingWithGPTextExternalIndexes.
SettingUptheSampleDatabaseTheexamplesinthisdocumentationworkwitha demo databasecontainingthreedatabasetables,called wikipedia.articles , twitter.message ,andstore.products .Ifyouwanttoruntheexamplesyourself,followtheinstructionsinthissectiontosetupthe demo database.
1. LogintotheGreenplumDatabasemasterasthegpadminuserandcreatethe demo database.
$createdbdemo
2. Openaninteractiveshellforexecutingqueriesinthe demo database.
$psqldemo
3. Createthe articles tableinthe wikipedia schemawiththefollowingstatements.
CREATESCHEMAwikipedia;CREATETABLEwikipedia.articles(idint8primarykey,date_timetimestamptz,titletext,contenttext,refstext)DISTRIBUTEDBY(id);
4. Createthe message tableinthe twitter schemawiththefollowingstatements.
CREATESCHEMAtwitter;CREATETABLEtwitter.message(idbigint,message_idbigint,spamboolean,created_attimestampwithouttimezone,sourcetext,retweetedboolean,favoritedboolean,truncatedboolean,in_reply_to_screen_nametext,in_reply_to_user_idbigint,author_idbigint,author_nametext,author_screen_nametext,author_langtext,author_urltext,author_descriptiontext,author_listed_countinteger,author_statuses_countinteger,author_followers_countinteger,author_friends_countinteger,author_created_attimestampwithouttimezone,author_locationtext,author_verifiedboolean,message_urltext,message_texttext)DISTRIBUTEDBY(id)PARTITIONBYRANGE(created_at)(START(DATE'2011-08-01')INCLUSIVEEND(DATE'2011-12-01')EXCLUSIVEEVERY(INTERVAL'1month'));CREATEINDEXid_idxONtwitter.messageUSINGbtree(id);
5. CREATEthe store.products tablewiththesestatements.
©CopyrightPivotalSoftware,Inc,2013-2018 33 2.3.1
CREATESCHEMAstore;CREATETABLEstore.products(idbigint,titletext,categoryvarchar(32),brandvarchar(32),pricefloat)DISTRIBUTEDBY(id);
6. Downloadtestdataforthethreetableshere .Right-clickthelink,savethefile,andthencopyittothegpadminuser’shomedirectory.
7. Extractthedatafileswiththistarcommand.
$tarxvfzgptext-demo-data.tgz
8. Loadthewikipediadataintothe wikipedia.articles tableusingthe psql\COPY metacommand.
\COPYwikipedia.articlesFROM'/home/gpadmin/demo/articles.csv'HEADERCSV;
The articles tablenowcontainstextfrom23Wikipediaarticles.
9. Loadthetwitterdataintothe twitter.message tableusingthefollowing psql\COPY metacommand.
\COPYtwitter.messageFROM'/home/gpadmin/demo/twitter.csv'CSV;
The message tablenowcontains1730tweetsfromAugusttoOctober,2011.
10. Loadtheproductstableintothe store.products tablewiththefollowing psql\COPY metacommand.
\COPYstore.productsFROM'/home/gpadmin/demo/products.csv'HEADERCSV;
The products tablenowcontains50rows.Thistableisusedtodemonstratefacetedsearchqueries.SeeCreatingFacetedSearchQueries.
SettinguptheGPTextCommand-lineEnvironmentToworkwithGPTextindexes,youmustfirstsetupyourenvironmentandaddtheGPTextschematothedatabasecontainingthedocuments(GreenplumDatabasedata)youwanttoindex.
Tosettheenvironment,loginasthe gpadmin userandsourcetheGreenplumDatabaseandGPTextenvironmentscripts.TheGreenplumDatabaseenvironmentmustbesetbeforeyousourcetheGPTextenvironmentscript.Forexample,ifbothGreenplumDatabaseandGPTextareinstalledinthe/usr/local/ directory,enterthesecommands:
$source/usr/local/greenplum-db-/greenplum_path.sh$source/usr/local/greenplum-text-/greenplum-text_path.sh
Withtheenvironmentnowset,youcanaccesstheGPTextcommand-lineutilities.
AddingtheGPTextSchematoaDatabaseUsethe gptext-installsql utilitytoaddtheGPTextschematodatabasescontainingdatayouwanttoindexwithGPText.Youperformthistaskonetimeforeachdatabase.Inthisexample,the gptext schemaisinstalledintothe demo database.
$gptext-installsqldemo
The gptext schemaprovidesuser-definedtypes,tables,views,andfunctionsforGPText.ThisschemaisreservedforGPText.Ifyoucreateanynewobjectsinthe gptext schema,theywillbelostwhenyoureinstalltheschemaorupgradeGPText.
CreatingGPTextIndexesandIndexingData
©CopyrightPivotalSoftware,Inc,2013-2018 34 2.3.1
http://docs-gptext-staging.cfapps.io/demo/gptext-demo-data.tgz
ThegeneralstepsforcreatingaGPTextindexandindexingdocumentsare:
1. CreateanemptySolrindex
2. Customizetheindex(optional)
3. Populatetheindex
4. Committheindex
Afteryoucompletethesesteps,youcancreateandexecuteasearchqueryorimplementmachinelearningalgorithms.SearchingGPTextindexesisdescribedintheQueryingGPTextIndexestopic.
ThefollowingstepsarecompletedbyexecutingSQLcommandsandGPTextfunctionsinthedatabase.RefertotheGPTextFunctionReferencefordetailsabouttheGPTextfunctionsdescribedinthefollowingexamples.
CreateanemptyGPTextindexAGPTextindexisanApacheSolrcollectioncontainingdocumentsaddedfromaGreenplumDatabasetable.TherecanbeoneGPTextindexperGreenplumDatabasetable.EachrowinthedatabasetableisadocumentthatcanbeaddedtotheGPTextindex.
Ifthedatabasetableispartitioned,thereisoneGPTextindexforallpartitions.Youmustspecifytheroottablenamewhencreatingtheindexandaddingdocumentstoit.GPTextprovidessearchsemanticsthatenablesearchingpartitionsefficiently.
AGPTextexternalindexisaSolrindexfordocumentsthatarelocatedoutsideofGreenplumDatabase.GPTextprovidesuser-definedfunctionstocreateexternalindexesandinsertdocumentsintothem.SeeWorkingwithGPTextExternalIndexes.
The gptext.create_index() functioncreatesanewGPTextindex.Thisfunctionhastwosignatures:
gptext.create_index(,,,[,])
or
gptext.create_index(,,,,,[,])
The and argumentsspecifythedatabasetablethatcontainsthesourcedocuments.
The argumentisthenameofthetablecolumnthatcontainsauniqueidentifierforeachrow.The columncanbeoftypeint4 , int8 , varchar , text ,or uuid .
The argumentisthenameofthetablecolumnthatcontainsthecontentyouwanttosearchbydefault.Forexample,ifyouwanttoindexandsearchjustthe column,youcanusethefirstsignatureandspecifythe content columnnameinthe argument.
Thefinal,optionalargument, ,isaBooleanargument.Whentrue,thedefault,attemptingtoaddadocumentwithanidthatalreadyexistsintheindexgeneratesanerror.Ifyousettheargumenttofalse,youcanadddocumentswiththesameid,butwhenyousearchtheindexalldocumentswiththesameIDarereturned.
Thefollowingcommandcreatesanindexforthe twitter.message table,withthe id columnastheuniqueIDfieldandthe message_text columnforthedefaultsearchcolumn:
=#SELECT*FROMgptext.create_index('twitter','message','id','message_text');
Toverifythatthe demo.twitter.message indexwascreated,call gptext.index_status() :
©CopyrightPivotalSoftware,Inc,2013-2018 35 2.3.1
=#SELECT*FROMgptext.index_status('demo.twitter.message');content_id|index_name|shard_name|shard_state|replica_name|replica_state|core|node_name|base_url|is_leader|partitioned|external_index------------+----------------------+------------+-------------+--------------+---------------+--------------------------------------+-------------------+--------------------------+-----------+-------------+----------------0|demo.twitter.message|shard0|active|core_node2|active|demo.twitter.message_shard0_replica1|gpdb51:18984_solr|http://gpdb51:18984/solr|f|t|f0|demo.twitter.message|shard0|active|core_node3|active|demo.twitter.message_shard0_replica2|gpdb51:18983_solr|http://gpdb51:18983/solr|t|t|f1|demo.twitter.message|shard1|active|core_node1|active|demo.twitter.message_shard1_replica1|gpdb51:18984_solr|http://gpdb51:18984/solr|t|t|f1|demo.twitter.message|shard1|active|core_node4|active|demo.twitter.message_shard1_replica2|gpdb51:18983_solr|http://gpdb51:18983/solr|f|t|f(4rows)
ThisexampleexecutedonaGreenplumDatabaseclusterwithtwoprimarysegments.Twoshardswerecreated,oneforeachsegment,andeachshardhastworeplicas.Thereplicasarenamedcore_node1throughcore_node4.
Youcanalsorunthe gptext-state-D
command-lineutilitytoverifytheindexwascreated.Seethegptext-statereferencefordetails.
TheGPTextindexforthe demo.twitter.message tableisconfigured,bydefault,toindexallcolumnsinthe twitter.message databasetable.Youcanwritesearchqueriesthatcontaincriteriausinganycolumninthetable.
Ifyouwanttoindexandsearchasubsetofthetablecolumns,youcanusethesecond gptext.create_index() signature,specifyingthecolumnstoindexinthe argumentandthedatatypesofthosecolumnsinthe argument.The and argumentsaretextarrays.The
idcolumnnameanddefaultsearchcolumnnamemustbeincludedinthearrays.
Usethesecond gptext.create_index() signaturetocreateanindexforthe wikipedia.articles table.Thisindexwillallowyoutosearchonthe title , content ,andrefs columns.Notethattheidcolumnanddefaultsearchcolumnarestillspecifiedinseparateargumentsfollowingthe and
arrays.
=#SELECT*FROMgptext.create_index('wikipedia','articles','{id,title,content,refs}','{long,text_intl,text_intl,text_intl}','id','content',true);INFO:Createdindexdemo.wikipedia.articlescreate_index--------------t(1row)
Becausethe date_time columnwasomittedfromthe and arrays,itwillnotbepossibletosearchthe wikipedia.articles indexondatewiththeGPTextsearchfunctions.
Customizetheindex(optional)CreatingaGPTextindexgeneratesasetofconfigurationfilesfortheindex.Beforeyouadddocumentstotheindex,youcancustomizetheconfigurationfilestochangethewaydataisindexedandstored.Youcancustomizeanindexlater,afteryouhaveaddeddocumentstoit,butyoumustthenreindexthedatatotakeadvantageofyourcustomizations.
Onecommoncustomizationistoremapdatatypesforsomedatabasecolumns.Inthe managed-schema configurationfileforanindex,GPTextmapsthedatatypesforeachfieldfromtheGreenplumDatabasetypetoanequivalentSolrdatatype.GPTextappliesdefaultmappings(seeGPTextandSolrDataTypeMappings),butyourindexmaybemoreeffectiveifyouuseadifferentmappingforsomefields.
The demo.twitter.message table,forexample,hasa message_text textcolumnthatcontainstweets.Bydefault,GPTextmapstextcolumnstotheSolr text_intl(internationaltext)type.TheGPText text_sm (socialmediatext)typeisabettermappingforatextcolumnthatcontainssocialmediaidiomssuchasemoticons.
Followthesestepstoremapthe message_text fieldtothe gtext_sm type.
1. Usethe gptext-config utilitytoeditthe managed-schema fileforthe demo.twitter.message index.
$gptext-configedit-idemo.twitter.message-fmanaged-schema
The managed-schema fileloadsinatexteditor(normallyvi).
2. Findthe elementforthe message_text field.
©CopyrightPivotalSoftware,Inc,2013-2018 36 2.3.1
3. Changethe type attributefrom text_intl to text_sm .
4. Savethefileandexittheeditor.
TherearemanyotherwaystocustomizeaGPTextindex.Forexample,youcanomitfieldsfromtheindexbychangingthe indexed attributeofthe elementto false ,storethecontentsofthefieldintheindexbychangingthe stored attributeto true ,oruse gptext-config toeditthe stopwords.txt filetospecifyadditionalwordstoignorewhenindexing.
SeeCustomizingGPTextIndexestolearnhowdatatypemappingdetermineshowSolranalyzesandindexesfieldcontentsandformorewaystocustomizeGPTextindexes.
PopulatetheindexTopopulatetheindex,usethetablefunction gptext.index() ,whichhasthefollowingsyntax:
SELECT*FROMgptext.index(TABLE(SELECT*FROM),);
Toindexallrowsinthe twitter.message table,executethiscommand:
=#SELECT*FROMgptext.index(TABLE(SELECT*FROMtwitter.message),'demo.twitter.message');dbid|num_docs------+----------2|8923|838(2rows)
Thiscommandindexestherowsinthe wikipedia.articles table.
=#SELECT*FROMgptext.index(TABLE(SELECT*FROMwikipedia.articles),'demo.wikipedia.articles');dbid|num_docs------+----------3|112|12(2rows)
Theresultsofthiscommandshowthat23documentsfromtwosegmentswereaddedtotheindex.
Thefirstargumentofthe gptext.index() functionisa“table-valuedexpression.” TABLE(SELECT*FROMwikipedia.articles)
createsatable-valuedexpression
fromthearticlestable,usingthetablefunction TABLE .
Youcanchoosethedatatoindexorupdatebychangingtheinnerselectlistinthequerytoselecttherowsyouwanttoindex.Whenaddingnewdocumentstoanexistingindex,forexample,specifya WHERE clauseinthe gptext.index() calltochooseonlythenewrowstoindex.
Theinner SELECT statementcouldalsobeaqueryonadifferenttablewiththesamestructure,oraresultsetconstructedwithanarbitrarilycomplexjoin,providedthecolumnsspecifiedinthe gptext.create_index() functionarepresentintheresults.Ifyouindexdatafromasourceotherthanthetableusedtocreatetheindex,besurethedistributionkeyfortheresultsetmatchesthedistributionkeyofthebasetable.TheGreenplumDatabase SELECTstatementhasa SCATTERBY clausethatyoucanusetospecifythedistributionkeyfortheresultsfromaquery.SeeSpecifyingadistributionkeywithSCATTERBYformoreaboutthedistributionpolicyandGPTextindexes.
CommittheindexAfteryoucreateandpopulateanindex,youcommittheindexusing gptext.commit_index() .
Thisexamplecommitsthedocumentsaddedtotheindexesinthepreviousexample.
©CopyrightPivotalSoftware,Inc,2013-2018 37 2.3.1
=#SELECT*FROMgptext.commit_index('demo.twitter.message');commit_index--------------t(1row)
=#SELECT*FROMgptext.commit_index('demo.wikipedia.articles');commit_index--------------t(1row)
The gptext.commit_index() functioncommitsanynewdataaddedtoordeletedfromtheindexsincethelastcommit.
ManagingGPTextIndexesGPTextprovidescommand-lineutilitiesandfunctionsyoucanusetoperformtheseGPTextmanagementtasks:
Configuringanindex
Optimizinganindex
SpecifyingadistributionpolicywithSCATTERBY
Deletingfromanindex
Droppinganindex
Addingafieldtoanindex
Droppingafieldfromanindex
Listingallindexes
ConfiguringanindexYoucanmodifyyourindexingbehaviorgloballybyusingthe gptext-config utilitytoeditasetofindexconfigurationfiles.Thefilesyoucaneditwithgptext-config are:
solrconfig.xml –ContainsmostoftheparametersforconfiguringSolritself(seehttp://wiki.apache.org/solr/SolrConfigXml ).
managed-schema –DefinestheanalyzerchainsthatSolrusesforvariousdifferenttypesofsearchfields(seeTextAnalyzerChains).
stopwords.txt –Listswordsyouwanttoeliminatefromthefinalindex.
protwords.txt –Listsprotectedwordsthatyoudonotwanttobemodifiedbytheanalyzerchain.Forexample,iPhone.
synonyms.txt –Listswordsthatyouwantreplacedbysynonymsintheanalyzerchain.
elevate.xml –Movesspecificwordstothetopofyourfinalindex.
emoticons.txt –Definesemoticonsforthe text_sm socialmediaanalyzerchain.(seeTheemoticons.txtfile).
Youcanalsouse gptext-config tomovefiles.
OptimizinganindexThefunction gptext.optimize_index(,) mergesallsegmentsintoasmallnumberofsegments( )forincreasedefficiency.
Example:
=#SELECT*FROMgptext.optimize_index('demo.wikipedia.articles',10);optimize_index----------------t(1row)
SpecifyingadistributionpolicywithSCATTERBY
©CopyrightPivotalSoftware,Inc,2013-2018 38 2.3.1
http://wiki.apache.org/solr/SolrConfigXml
Thefirstparameterof gptext.index() isatable-valuedexpression,suchas TABLE(SELECT*FROMwikipedia.articles)
.Thequeryinthisparametermusthave
thesamedistributionpolicyasthetableyouareindexingsothatdocumentsaddedtotheindexareassociatedwiththecorrectGreenplumDatabasesegments.Somequeries,however,havenodistributionpolicyortheyhaveadifferentdistributionpolicy.Thiscouldhappenifthequeryisajoinoftwoormoretablesoraqueryonanintermediate(staging)tablethatisdistributeddifferentlythanthebasetablefortheindex.
Tospecifyadistributionpolicyforaqueryresultset,theGreenplumDatabaseSELECTstatementhasa“SCATTERBY”clause.
TABLE(SELECT*FROMwikipedia.articlesSCATTERBY)
where distrib_id isthesamedistributionkeyusedtodistributethebasetablefortheindex.
DeletingfromanindexYoucandeletefromanindexusingaquerywiththefunction gptext.delete(,) .Thisdeletesfromtheindexalldocumentsthatmatchthesearchquery.Todeletealldocuments,usethequery '*' .
Afterasuccessfuldeletion,execute gptext.commit_index() tocommitthechange.
Thisexampledeletesalldocumentscontaining "toxin" inthedefaultsearchfield.
=#SELECT*FROMgptext.delete('demo.wikipedia.articles','toxin');delete--------t(1row)
SELECT*FROMgptext.commit_index('demo.wikipedia.articles');
Examplethatdeletesalldocumentsfromtheindex:
SELECT*FROMgptext.delete('demo.wikipedia.articles','*:*');
Besuretocommitchangestotheindexafterdeletingdocuments.
SELECT*FROMgptext.commit_index('demo.wikipedia.articles');
DroppinganindexYoucancompletelyremoveanindexwiththe gptext.drop_index() function.
Example:
SELECT*FROMgptext.drop_index('demo.wikipedia.articles');
AddingafieldtoanindexYoucanaddafieldtoanexistingindexusingthe gptext.add_field() function.Forexample,youcanaddafieldtotheindexafteracolumnisaddedtotheunderlyingdatabasetableoryoucanaddafieldtoindexacolumnthatwasnotspecifiedwhentheindexwascreated.
GPTextmapstheGreenplumDatabasefieldtypetoanequivalentSolrdatatypeautomatically.SeeGPTextandSolrDataTypeMappingsforatableofdatatypemappings.
©CopyrightPivotalSoftware,Inc,2013-2018 39 2.3.1
CREATETABLEmyarticles(idint8primarykey,date_timetimestamptz,titletext,contenttext,refstext)DISTRIBUTEDBY(id);
SELECT*FROMgptext.create_index('wikipedia','myarticles','id','content',true);...populatetheindex...SELECT*FROMgptext.commit_index('demo.wikipedia.myarticles');
ALTERTABLEmyarticlesADDnotestext;SELECT*FROMgptext.add_field('demo.wikipedia.myarticles','notes',false,false);SELECT*FROMgptext.reload_index('demo.wikipedia.myarticles');
AddingafieldtoaGPTextindexrequiresthebasetabletobeavailable.Ifyoudropthetableaftercreatingtheindex,youcannotaddfieldstotheindex.
DroppingafieldfromanindexYoucandropafieldfromanexistingindexwiththe gptext.drop_field() function.Afteryouhavedroppedfields,call gptext.reload_index() toreloadtheindex.
Example:
SELECT*FROMgptext.drop_field('demo.wikipedia.myarticles','notes');SELECT*FROMgptext.reload_index('demo.wikipedia.myarticles');
ListingallindexesYoucanlistallindexesintheGPTextclusterusingthe gptext-state command-lineutility.Forexample:
$gptext-state-D20170822:10:11:23:029752gptext-state:gpsne:gpadmin-[INFO]:-ExecuteGPTextstate...20170822:10:11:23:029752gptext-state:gpsne:gpadmin-[INFO]:-Checkzookeeperclusterstate...20170822:10:11:23:029752gptext-state:gpsne:gpadmin-[INFO]:-CheckGPTextclusterstatus...20170822:10:11:23:029752gptext-state:gpsne:gpadmin-[INFO]:-CurrentGPTextVersion:2.1.220170822:10:11:24:029752gptext-state:gpsne:gpadmin-[INFO]:-Allnodesareupandrunning.20170822:10:11:24:029752gptext-state:gpsne:gpadmin-[INFO]:------------------------------------------------20170822:10:11:24:029752gptext-state:gpsne:gpadmin-[INFO]:-Indexstatedetails.20170822:10:11:24:029752gptext-state:gpsne:gpadmin-[INFO]:------------------------------------------------20170822:10:11:24:029752gptext-state:gpsne:gpadmin-[INFO]:-databaseindexnamestate20170822:10:11:24:029752gptext-state:gpsne:gpadmin-[INFO]:-wikipediademo.wikipedia.articlesGreen20170822:10:11:28:029752gptext-state:gpsne:gpadmin-[INFO]:-Done.
StoringFieldContentinanIndexSolrcanstorethecontentsofcolumnsintheindexsothatresultsofasearchontheindexcanincludethecolumncontents.Thismakesitunnecessarytojointhesearchqueryresultswiththeoriginaltable.Youcanevenstorethecontentsofdatabasecolumnsthatarenotindexedandreturnthatcontentwithsearchresults.GPTextreturnstheadditionalfieldcontentinabufferaddedtothesearchresults.Individualfieldscanberetrievedfromthisbufferusingthe gptext.gptext_retrieve_field() , gptext.gptext_retrieve_field_int() ,and gptext.gptext_retrieve_field_float() functions.
Onedesignpatternistostorecontentforallofatable’scolumnsintheGPTextindexsothedatabasetablecanthenbetruncatedordropped.AdditionaldocumentscanbeaddedtotheGPTextindexlaterbyinsertingthemintothetruncatedtable,orintoatemporarytablewiththesamestructure,andthenaddingthemtotheindexwiththe gptext.index() function.
ToenablestoringcontentinaGPTextindex,youmusteditthe managed-schema filefortheindex.The elementforeachfieldhasa stored att