Upload
poppy-horton
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Teuthology
Presented [email protected]
image credit: http://www.flickr.com/photos/peterblapps/3250800528/
Ceph as in
CephalopodaMolluscaInvertebrae
TeuthologyMalacology
Not your grandmother's software stack
We tried Autotest
... and quickly discovered it's limitations
Currently at 15 independent patches, 24 files changed, 575 insertions(+), 19 deletions(-)
Realized Autotest's architecture is working against us.
We still use it for it's packaged "client side" tests, but not its multi-machine features.
Multi-machine control
Python+ Paramiko (SSH)+ gevent= orchestra
Real-timeInteractiveCentral controllerFull SSH protocol (channels!)Not ChefNot Fabric
cluster = Cluster(...)cluster.run(...)cluster.only('x86').run(...)cluster.exclude('x86').run(...)
http://github.com/tv42/orchestra
Teuthology is a test runner
Run tasks on targets as told to by roles.
AutomaticallySetupMonitor healthRun test(s)Archive resultsArchive logs, core dumps, etcClean up
http://github.com/tv42/teuthology
Read the README
Run tasks on targets as told to by roles.
targets:- [email protected] [email protected] [email protected]
You need to have SSH working, without passphrases.
You need passphraseless sudo on the remote host.
YAML format:lists, dicts, strings, numbers.
Run tasks on targets as told to by roles.
roles:- [mon.0, mds.0, osd.0]- [mon.1, osd.1]- [mon.2, client.0]
Run tasks on targets as told to by roles.
roles:- [mon.0, mds.0, osd.0]- [mon.1, osd.1]- [mon.2, client.0]
targets:- [email protected] [email protected] ubuntu@sepiaZZ...
Run tasks on targets as told to by roles.
tasks:- ceph:- kclient: [client.0]- autotest: client.0: [dbench]
Interactive mode
tasks:- interactive:
INFO:teuthology.run_tasks:Running task interactive...Ceph test interactive mode, use ctx to interact with the cluster, press control-D to exit...>>> 1+12>>>
Interactive mode
>>> ctx.cluster.only('osd.0').run(args=['uptime'])INFO:orchestra.run.out: 13:05:38 up 42 days, 23:17, 0 users, load average: 0.12, 0.09, 0.07[<orchestra.run.RemoteProcess object at 0x28bd110>]
One RemoteProcess per command run.
Using just one Remote first
>>> (remote,) = ctx.cluster.only('osd.0').remotes.keys()>>> proc = remote.run(args=['echo', '*'])INFO:orchestra.run.out:*>>> proc<orchestra.run.RemoteProcess ...>>>> proc.command"echo '*'"
Shell quoting done for you.
Works like ctx.cluster.run.
Just one RemoteProcess, not a list.
Failing processes
>>> remote.run(args=['bork'])INFO:orchestra.run.err:bash: bork: command not found...CommandFailedError: Command failed with status 127: 'bork'
>>> proc = remote.run(args=['bork'],... check_status=False)INFO:orchestra.run.err:bash: bork: command not found>>> proc.exitstatus127
Concurrency
>>> proc = remote.run(args=['uptime'], wait=False)>>> proc<orchestra.run.RemoteProcess object at 0x28bd1d0>>>> proc.exitstatus<gevent.event.AsyncResult object at 0x28c2a10>
Concurrency
>>> proc.exitstatus<gevent.event.AsyncResult object at 0x28c2a10>>>> import time; time.sleep(0)INFO:orchestra.run.out: 13:16:48 up 42 days, 23:28, 0 users, load average: 0.35, 0.15, 0.08>>> proc.exitstatus<gevent.event.AsyncResult object at 0x28c2a10>>>> proc.exitstatus.get()0
Capturing stdout/stderr
>>> from orchestra import run>>> proc = remote.run(args=['uname', '-m'],... wait=False, stdout=run.PIPE)>>> proc.exitstatus<gevent.event.AsyncResult object at 0x28c2dd0>>>> proc.exitstatus.ready() # just for debugFalse>>> proc.stdout.read()'x86_64\n'>>> proc.exitstatus.get()0
Deadlocks you must avoid:stdout vs stderrstdout/err vs stdinstdout/err vs exit
Using Cluster
>>> processes = ctx.cluster.run(... args=['uname', '-m'],... wait=False,... stdout=run.PIPE)>>> processes[<orchestra.run.RemoteProcess object at 0x28bdbf0>, <orchestra.run.RemoteProcess object at 0x28bdb90>, <orchestra.run.RemoteProcess object at 0x28bdad0>]>>> [p.stdout.read() for p in processes]['x86_64\n', 'x86_64\n', 'x86_64\n']>>> run.wait(processes)>>>
Controlling stdout/stderr logging
>>> import logging>>> log = logging.getLogger(__name__)>>> log.info('foo')INFO:__builtin__:foo>>> ctx.cluster.only('osd.0').run(... args=['uptime'],... logger=log.getChild('uptime'))INFO:__builtin__.uptime.out: 13:52:49 up 43 days, 4 min, 0 users, load average: 0.00, 0.01, 0.05[<orchestra.run.RemoteProcess object at 0x28bdb90>]>>>
Usually looks like teuthology.task.foo
Tasks can be context managers
tasks:- ceph:- kclient: ...- autotest: ...- interactive:
/tmp/cephtest
Must not exist already, or target is dirty (see teuthology-nuke, later)
Used by tasks to store things in
Tasks are responsible for cleaning up after themselves (no toplevel rm -rf, to flush out the bugs)
Anything in /tmp/cephtest/archive gets archived
Please bzip2 -9 any big files your task leaves in archive
Cleanups & failures
Clean up can fail, further cleanups are still attempted -> always study the first error, not the last one.
If a task fails to clean up, the targets are left "dirty".
teuthology-nuke is a Big Hammer.
Archived results
2011-06-21T10-00-44/├── ceph-sha1├── config.yaml├── remote│ ├── [email protected]│ │ ├── log│ │ │ ├── client.admin.log.bz2│ │ │ ├── mds.0.log.bz2│ │ │ ├── mon.0.log.bz2│ │ │ └── osd.0.log.bz2│ │ └── syslog│ │ ├── kern.log.bz2│ │ └── misc.log.bz2│ ├── [email protected] ...│ └── [email protected]│ ├── autotest│ │ └── ...│ ├── log ...│ └── syslog ...├── summary.yaml└── teuthology.log
gitbuilder
A low-key low-hype continuous integration tool
Builds tags and heads of branches
On bad build, tries older commits until finds green
We have it building ceph and our kernel fork
http://ceph.newdream.net/gitbuilder/http://ceph.newdream.net/gitbuilder-i386/http://ceph.newdream.net/gitbuilder-gcov-amd64/http://ceph.newdream.net/gitbuilder-deb-amd64/http://ceph.newdream.net/gitbuilder-kernel-amd64/
We made gitbuilder create tarballs
http://ceph.newdream.net/gitbuilder/output/ref/origin_master/
Index of /output/ref/origin_master/mode links bytes last-changed name dr-x 2 4096 Jun 29 13:58 ./ dr-x 28 12288 Jun 29 15:16 ../ -r-- 1 149323650 Jun 29 13:58 ceph.x86_64.tgz -r-- 1 41 Jun 29 13:57 sha1
Don't trust the links, ProxyPass confuses the web server
Fetch .../output/origin_master/sha1, then fetch .../output/sha1/SHA1_HERE/ceph.x86_64.tgz
Future and topics not covered
teuthology-suitenightly runsmachine allocationgcovflavorscustom ceph buildsinstalling custom kernelsfailure testingmonitor health
Thank YouQuestions?