Ceph Day Beijing: CeTune: A Framework of Profile and Tune Ceph Performance

  • View
    604

  • Download
    42

Embed Size (px)

Transcript

1. CeTune *Ceph Profiling and Tuning Framework Chendi XUE, chendi.xue@intel.com Software Engineer 2. Agenda Background How to use CeTune CeTune modules How CeTune help to Tune Summary 2 3. Agenda Background How to use CeTune CeTune modules How CeTune help to Tune Summary 3 4. Background What is the problem ? End users face numerous challenges to drive best performance Increasing requests from end users on: How to troubleshooting the *Ceph cluster? How to identify the best tuning knobs from many (500+) parameters How to handle the unexpected performance regression between frequent releases. Solution ? A toolkit/framework to Easy deploy and benchmark ceph cluster Easy way to analyze performance result, system metrics, perfcounter and latency layout Shorten users landing time of *Ceph based storage solution 4 5. CeTune 5 What it is A toolkit to deploy, benchmark, profile and tune Ceph cluster performance 6. Agenda Background How to use CeTune CeTune modules How CeTune help to Tune Summary 6 7. CeTune internal CeTune Controller: Provides a console interface for user to monitor the working progress and view the performance result data. Controls all other CeTune nodes to deploy, benchmark, and monitor system and ceph status. CeTune Node: Self-check if the work has done successfully and response to controller. 7 8. CeTune configuration Conf/all.conf all.conf configures most information for CeTune, including information to deploy and information to run benchmark, etc. field value description deploy_ceph_version hammer Ceph version deploy_mon_servers aceph01 node to deploy mon deploy_osd_servers aceph01,aceph02,aceph03,ace ph04 node to deploy osd deploy_rbd_nodes client01,client02 node to deploy rbd and rados aceph01 /dev/sda1:/dev/sdb1,/dev/sdd 1:/dev/sdb2 , set osd and journal device here aceph02 /dev/sda1:/dev/sdb1,/dev/sdd 1:/dev/sdb2 , aceph03 /dev/sda1:/dev/sdb1,/dev/sdd 1:/dev/sdb2 , aceph04 /dev/sda1:/dev/sdb1,/dev/sdd 1:/dev/sdb2 , osd_partition_count 1 using script in deploy/prepare- scripts/list_partitions.sh, osd_partition_size 2000G will do partition for you journal_partition_count 5 journal_partition_size 60G public_network 10.10.5.0/24 tell ceph cluster to use 10Gb nic cluster_network 10.10.5.0/24 field value description head client01 Configuration for benchmark tmp_dir /opt/ user root list_vclient vclient01,vclient02,vclient03,vclient04, list_client client01,client02 list_ceph aceph01,aceph02,aceph03,aceph04 list_mon aceph01 volume_size 40960 will create rbd volume using this param rbd_volume_count 80 rbd volume total number run_vm_num 80, 70, 60, 50, if you want to run vm/rbd loadline, set here run_file /dev/vdb when using vm, rbd will mount as /dev/vdb in vm run_size 40g for fio run_size, must be small than volume_size run_io_pattern seqwrite,seqread,randwrite,randread run_record_size 64k,4k fio block size run_queue_depth 64,8 run_warmup_time 100 run_time 300 dest_dir /mnt/data/ destination directory dest_dir_remote_b ak 192.168.3.101:/data4/Chendi/ArborVa lley/v0.91/raw/ destination directory rbd_num_per_clien t 40,40 run test 40 rbd in client01 and test 40 rbd in client02 respectively #deploy ceph #benchmark 8 9. CeTune configuration Conf/tuner.yaml tuner.yaml configures a job worksheet. each testjob can be applied different tunings. testjob1: workstages: [deploy,benchmark"] benchmark_engine: "qemurbd version: 'hammer' pool: rbd: size: 2 pg_num: 8192 disk: read_ahead_kb: 2048 global: debug_lockdep: 0/0 debug_context: 0/0 mon_pg_warn_max_per_osd: 1000 ms_nocrc: true throttler_perf_counter: false osd: osd_enable_op_tracker: false osd_op_num_shards: 10 filestore_wbthrottle_enable: false filestore_max_sync_interval: 10 filestore_max_inline_xattr_size: 254 filestore_max_inline_xattrs: 6 filestore_queue_committing_max_bytes: 1048576000 filestore_queue_committing_max_ops: 5000 filestore_queue_max_bytes: 1048576000 filestore_queue_max_ops: 500 journal_max_write_bytes: 1048576000 journal_max_write_entries: 1000 journal_queue_max_bytes: 1048576000 journal_queue_max_ops: 3000 testjob2: testjob3: 9 10. Kickoff CeTune root@client01:/root# cd /root/cetune/tuner root@client01:/root/cetune/tuner# python tuner.py [LOG]Check ceph version, reinstall ceph if necessary [LOG]start to redeploy ceph [LOG]ceph.conf file generated [LOG]Shutting down mon daemon [LOG]Shutting down osd daemon [LOG]Clean mon dir [LOG]Started to mkfs.xfs on osd devices [LOG]mkfs.xfs for /dev/sda1 on aceph01 [LOG]mkfs.xfs for /dev/sdf1 on aceph04 [LOG]Build osd.0 daemon on aceph01 [LOG]Build osd.39 daemon on aceph01 [LOG]delete ceph pool rbd [LOG]delete ceph pool data [LOG]delete ceph pool metadata [LOG]create ceph pool rbd, pg_num is 8192 [LOG]set ceph pool rbd size to 2 [WARNING]Applied tuning, waiting ceph to be healthy [WARNING]Applied tuning, waiting ceph to be healthy [LOG]Tuning has been applied to ceph cluster, ceph is healthy now RUNID: 36, Result dir: //mnt/data/36-80-seqwrite-4k-100-300-vdb [LOG]Prerun_check: check if rbd volumes are initialized [WARNING]Ceph cluster used data: 0.00KB, planed data: 3276800MB [WARNING]rbd volume initialization not done [LOG]80 RBD Images created [LOG]create rbd volume vm attaching xml [LOG]Distribute vdbs xml [LOG]Attach rbd image to vclient1 [LOG]Start to initialize rbd volumes [LOG]FIO Jobs started on [vclient01,vclient02, . vclient80] [WARN]160 fio job still running [LOG]RBD initialization complete [LOG]Prerun_check: check if fio installed in vclient [LOG]Prerun_check: check if rbd volume attached [LOG]Prerun_check: check if sysstat installed [LOG]Prepare_run: distribute fio.conf to vclient [LOG]Benchmark start [LOG]FIO Jobs started on [vclient01,vclient02, . vclient80] [WARN]160 fio job still running [LOG]stop monitoring, and workload [LOG]collecting data [LOG]processing data [LOG]creating html report [LOG]scp to result backup server 10 11. Agenda Background How to use CeTune CeTune modules How CeTune help to Tune Summary 11 12. Deploy Configure: 1. all.conf, 2. tuner.yaml Preparation: 1. Connect to apt/yum source 2. Auto ssh to each nodes 3. Disk partition One click to Start cetune Compare current ceph version vs. desired version, reinstall if necessary Deploy to all nodes: 1. Rbd 2. osd, mon, mds 3. Object workload generator Apply tuner.yaml tuning knobs to ceph cluster Wait ceph cluster to be healthy During CeTune deployment phase: root@client01:/root/cetune/tuner# python tuner.py [LOG]Check ceph version, reinstall ceph if necessary [LOG]start to redeploy ceph [LOG]ceph.conf file generated [LOG]Shutting down mon daemon [LOG]Shutting down osd daemon [LOG]Clean mon dir [LOG]Started to mkfs.xfs on osd devices [LOG]mkfs.xfs for /dev/sda1 on aceph01 [LOG]mkfs.xfs for /dev/sdf1 on aceph04 [LOG]Build osd.0 daemon on aceph01 [LOG]Build osd.39 daemon on aceph01 [LOG]delete ceph pool rbd [LOG]delete ceph pool data [LOG]delete ceph pool metadata [LOG]create ceph pool rbd, pg_num is 8192 [LOG]set ceph pool rbd size to 2 [WARNING]Applied tuning, waiting ceph to be healthy [WARNING]Applied tuning, waiting ceph to be healthy [LOG]Tuning has been applied to ceph cluster, ceph is healthy now 12 13. Benchmark Configure in tuner.yaml 1. workload engine 2. tuning knobs 3. io pattern config Preparation: 1. Prepare virtual machine 2. Install workload generator Fio, cosbench, cephfs engine One click to Start cetune Compare current ceph tuning vs. desired tuning, re-apply if necessary Prepare to do benchmark: 1. Check workload and rbd volume, create rbd if necessary 2. Initialize rbd if needed During benchmark phase: 1. Monitor system metrics data 2. Fetch perfcounter data 3. Fetch lttng data 4. Block to wait workload process to complete Wait ceph cluster to be healthy During CeTune benchmark phase: * Cosbench is an open source Benchmarking tool developed by Intel to measure Cloud Object Storage Service performance, which can act as an object workload under CBT framework. 13 14. Analyzer Process data of sar data, iostat data, perfcounter data, lttng datawip Blktrace( wip ) Valgrind( wip ) Data Archieved to One folder Process data of system metrics and perfcounter: 1. node by node 2. result as a big json node field name (iostat, perfcounter) key (w/s, r_op_latency ) second count , data Process data of lttng( wip ): 1. lttng data being traced following google dapper semantics, with one unified trace_id identify the tracepoints of one same io 2. Send lttng data to zipkin-collecter, and can be viewed by zipkin web Process data of blktrace, valgrind ( wip ) Send to visualizer module During CeTune analyze phase: 14 15. Tuner Tuner extracts ceph cluster configuration from all.conf, and automatically generate a tuner.conf file with some tuning reference. The main usage of tuner is user can test ceph cluster over multi version with various tuning knobs. Users can define a bunch of testjobs there, each testjob can has multi-workstage like reinstall then re-build, and then start the benchmark test. So using tuner, CeTune is able to make the ceph performance test for automatically. SEQWRITE SEQREAD RANDWRITE RANDREAD SEQWRITE SEQREAD RANDWRITE RANDREAD SEQWRITE SEQREAD RANDWRITE RANDREAD Firefly Giant Hammer unTuned vs. Tuned unTuned Tuned 15 16. Visualizer CeTune provides html page to show the result data. System metrics View Latency layouts View 16 17. Agenda Background How to use CeTune CeTune modules How CeTune help to Tune Summary 17 18. Firefly Randread Case runid op_size op_type QD engine serverNum clientNum rbdNum runtime fio_iops fio_bw fio_latency osd_iops osd_bw osd_latency Before tune 4k randread qd8 vdb 4 2 40 401 sec 3389.000 13.313 MB/s 93.991 msec 3729.249 16.798 MB/s 15.996 msec Before tune 4k randread qd8 vdb 4 2 80 301 sec 3693.000 14.577 MB/s 172.485 msec 3761.441 14.986 MB/s 16.452 msec Long frontend latency, and short backend latency Randread 40 vm, each