Upload
stuart
View
22
Download
2
Embed Size (px)
DESCRIPTION
Configuring Quill Condor Week 2007. Execute-Only. Execute-Only. Submit-Only. = Process Spawned. schedd. master. master. master. startd. startd. Typical Condor Pool. Central Manager. = ClassAd Communication Pathway. master. negotiator. collector. What is Quill?. - PowerPoint PPT Presentation
Citation preview
Greg ThainComputer Sciences DepartmentUniversity of Wisconsin-Madison
Gthain @cs.wisc.eduhttp://www.cs.wisc.edu/condor
Configuring QuillCondor Week 2007
www.cs.wisc.edu/condor
Typical Condor PoolCentral Manager
master
collector
negotiator
= ClassAd Communication Pathway
= Process Spawned
Submit-Only
master
schedd
Execute-Only
master
startd
Execute-Only
master
startd
www.cs.wisc.edu/condor
What is Quill?
A technology to store a read only version of the job queue and job historical data in a relational database.
www.cs.wisc.edu/condor
Why Quill?
Offloads query overhead from sched Performance boost!
› Easier to make web portal RDMS access easier than SOAP/CLI
www.cs.wisc.edu/condor
Job Queue Management
Job Queue
schedd
quilld
Database
Job Queue
schedd
Without Quill With Quill
www.cs.wisc.edu/condor
Quill downsides
› Additional latency
› More complicated setup
› Handful of attributes not in DBMS
www.cs.wisc.edu/condor
Quill and Quill++
› Quill in Condor since 6.7.11› Quill++ (quillpp) coming soon.
Support for all daemons Multiple schedds in one database Support for Oracle on some platforms Replaces quill
› We’ll talk about both
www.cs.wisc.edu/condor
Typical Quill’d Condor Pool
Central Manager
master
collector
negotiator
= ClassAd Communication Pathway
= Process Spawned
Submit-Only
master
schedd
Execute-Only
master
startd
Execute-Only
master
startd
Database
postgres
query
quillquillcondor_q
www.cs.wisc.edu/condor
Typical Quillpp’d Condor Pool
Central Manager
master
collector
negotiator
= ClassAd Communication Pathway
= Process Spawned
Submit-Only
master
schedd
Execute-Only
master
startd
Execute-Only
master
startd
Database
postgres
query
quillquillppcondor_q
quillpp
quillpp
quillpp
www.cs.wisc.edu/condor
How to use Schema?
› We’ll talk about this in another talk Quill Front End and Schema BoF
• Thursday 11am
www.cs.wisc.edu/condor
Quill (not Quill++) Deployment
› One Quill daemon per schedd
› Quill daemons must be uniquely named
› Each Quill daemon uses a unique DB name
› Currently uses PostgreSQL Recommend PostgreSQL 8.2 or later
• Better disk management
www.cs.wisc.edu/condor
Quill++ deployment
› One condor_quillpp per machine
› One condor_dbmsd per database
› Manual installation of schema
› One DB per pool
› Uses Postgres or Oracle
www.cs.wisc.edu/condor
Condor’s Interface to Quill
› Modified two tools to utilize the DB condor_q condor_history
www.cs.wisc.edu/condor
A User Perspective: condor_q
› condor_q changes When QUILL_ENABLED, goes to rdbms
-name takes a ScheddName or QuillName
-avgqueuetime details average time in queue for all jobs
www.cs.wisc.edu/condor
Condor_q -direct
› -direct rdbms (default when QUIL_ENABLE=true)
› -direct quilld (useful for firewall traversal)
› -direct schedd (100% up-to-date view)
www.cs.wisc.edu/condor
A User Perspective: condor_history
› condor_history changes -name takes a Quill Name to retrieve
job histories from a remote quill’s database
www.cs.wisc.edu/condor
Condor_history -direct
› There isn’t any (yet)
› Condor_history –f \ `condor_config_val HISTORY`
› No –direct quilld equivalent
www.cs.wisc.edu/condor
PostgreSQL Configuration
› Add two special user accounts: quillreader and quillwriter createuser quillreader --no-createdb --no-adduser --pwprompt
createuser quillwriter --createdb --no-adduser --pwprompt
www.cs.wisc.edu/condor
PostgreSQL Configuration (cont)
› Allow TCP/IP connections Edit file postgresql.conf
• Add listen_address = '*'
› Allow connections from specific hosts Edit file pg_hba.conf
• host all quillreader 128.105.0.0 255.255.0.0 password• host all quillwriter 128.105.0.0 255.255.0.0 password
› Note: only use ‘password’ authentication at this time.
www.cs.wisc.edu/condor
Quill Configuration
› User quillwriter needs a password.› Store it in ›$(SPOOL)/.quillwritepassword (quill)›$(SPOOL)/.pgpass (quill++)
.pgpass has host:port:db:user:pass
› Ensure only the condor uid can read it if Condor is running as root
www.cs.wisc.edu/condor
Quill Configuration (cont)
› Condor system specific attributes in file condor_config.local QUILL = $(SBIN)/condor_quill QUILL_LOG = $(LOG)/QuillLog QUILL_ADDRESS_FILE = $(LOG)/.quill_address DAEMON_LIST = …, QUILL VALID_SPOOL_FILES = …, .quillwritepassword DC_DAEMON_LIST = …, QUILL
www.cs.wisc.edu/condor
Quill Configuration (cont)
› Quill specific attributes QUILL_ENABLED = TRUE # The quill name must be unique across all # quill daemons AND schedds QUILL_NAME = [email protected] QUILL_DB_NAME = psilord_db QUILL_DB_IP_ADDR = merlin.cs.wisc.edu:42999 QUILL_POLLING_PERIOD = 10 (seconds)
www.cs.wisc.edu/condor
Quill Configuration (cont)
› QUILL_HISTORY_CLEANING_INTERVAL = 24 (hours)› QUILL_HISTORY_DURATION = 30 (days)› QUILL_MANAGE_VACUUM = FALSE› QUILL_IS_REMOTELY_QUERYABLE = TRUE› QUILL_DB_QUERY_PASSWD = xxx
www.cs.wisc.edu/condor
Schema management
› Quill automatically loads schema Upgrades itself automatically
› Quill++ requires manual loading: Psql –
Uquillwriter<common_createddl.sql Psql –Uquillwriter<pgsql_createddl.sql
www.cs.wisc.edu/condor
Conversion to Quill++
› Conversion only matters for history
› Conversion is one-way-only!
› Two steps: Dump quill history tables to file with
• Condor_dump_history Load quill++ history tables from file
with• Condor_load_history
www.cs.wisc.edu/condor
Data Management
› Constrain database size History truncation
• Quill++ other tables, too Postgres Index management Oracle cleans itself
› Careful of long queries, esp with Quill
www.cs.wisc.edu/condor
Data Management: Quill
› HISTORY_CLEANING_INTERVAL In hours (24 hours)
› HISTORY_DURATION How long in days (7 days)
› QUILL_SHOULD_REINDEX Boolean (false)
› QUILL_MANAGE_VACUUM (false)
www.cs.wisc.edu/condor
Data Management: Quill++
› Condor_dbmsd does all the work QUILL_DBSIZE_LIMIT (20 Gb)
– Emails warning when 75% is hit
DATABASE_PURGE_INTERVAL (s (24 hours)) DATABASE_REINDEX_INTERVAL (s (24 hours)) QUILL_DB_TYPE (oracle, pgsql) QUILL_RESOURCE_HISTORY_DURATION (7 days) QUILL_JOB_HISTORY_DURATION (10 years!) QUILL_RUN_HISTORY_DURATION (7 days)
www.cs.wisc.edu/condor
Thank you!
› Want more information?
› BOF “Databases in Condor”