Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)

  • Published on
    12-Apr-2017

  • View
    660

  • Download
    2

Embed Size (px)

Transcript

  • Testing tools

    Wojciech.Biela@teradata.com

    ukasz.Osipiuk@teradata.comKarol.Sobczak@teradata.com

  • Why we need them

    Certified distro Enterprise support Quarterly releases

    Product testing - Tempto Performance testing - Benchto

  • TemptoProduct test framework

    github.com/prestodb/tempto

    ukasz Osipiuklukasz.osipiuk@teradata.com

    http://www.github.com/prestodb/temptohttp://www.github.com/prestodb/tempto

  • What is Tempto? End-to-end product testing framework Targeted to software engineers For automation Tests easy to define Focus on test code Focus on database systems

    So far used for testing Presto internal projects

  • How is test defined? Java SQL convention based

  • Example Java based testpublic class SimpleQueryTest extends ProductTest {

    private static class SimpleTestRequirements implements RequirementsProvider{ public Requirement getRequirements(Configuration config) { return new ImmutableHiveTableRequirement(NATION); } } @Inject Configuration configuration;

    @Test(groups = {"smoke", "query"}) @Requires(SimpleTestRequirements.class) public void selectCountFromNation() { assertThat(query("select count(*) from nation")) .hasRowsCount(1) .hasRows(row(25)); }}

  • Example Convention based test

    allRows.sql:-- database: hive; tables: blah

    SELECT * FROM sample_table

    allRows.result:-- delimiter: |; ignoreOrder: false; types: BIGINT,VARCHAR

    1|A|

    2|B|

    3|C|

  • Tempto architecture

    user provided

    library provided

    TestNG

    TestNGlisteners utils

    testsrequirements requirement

    fulfillers

  • Tempto architecture Works well Extensible Well knownTestNG

    TestNGlisteners utils

    testsrequirements requirement

    fulfillers

  • Tempto architecture Tempto specific extension of TestNG

    execution framework Requirements management Tests filtering Injecting dependencies Extended logging

    TestNG

    utils

    testsrequirements requirement

    fulfillers

    TestNGlisteners

  • Tempto architecture Test code :)

    Java SQL-convention basedTestNG

    utils

    requirements requirementfulfillers

    TestNGlisteners

    tests

  • Tempto architecture Declarative requirements Fulfilled by test framework via

    pluggable fulfillers e.g. mutableTable(

    Tpch.NATION,LOADED,hive)

    Test level and suite level Cleanup

    TestNG

    utilsTestNGlisteners

    testsrequirements requirement

    fulfillers

  • Tempto architecture extra assertions various tools

    HDFS client SSH client JDBC query executor

    TestNG

    TestNGlisteners

    testsrequirements requirement

    fulfillers

    utils

  • Executable runnerjava -jar target/presto-product-tests-0.120-SNAPSHOT-executable.jar --help

    usage: Presto product tests --config-local URI to Test local configuration YAML file. --report-dir Test reports directory --groups Test groups to be run --excluded-groups Test groups to be excluded --tests Test patterns to be included -h,--help Shows help message

    All dependencies embedded User provides cluster details through yaml config.

  • Configurationhdfs: username: hdfs webhdfs: host: master port: 50070

    tests: hdfs: path: /product-test

    databases: default: alias: presto

    hive: jdbc_driver_class: org.apache.hive.jdbc.HiveDriver jdbc_url: jdbc:hive2://master:10000 jdbc_user: hdfs jdbc_password: na jdbc_pooling: false jdbc_jar: test-framework-hive-jdbc-all.jar

    presto: jdbc_driver_class: com.facebook.presto.jdbc.PrestoDriver jdbc_url: jdbc:presto://localhost:8080/hive/default jdbc_user: hdfs jdbc_password: na jdbc_pooling: false

  • Benchtomacro benchmarking framework

    github.com/teradata/benchto (very soon)

    Karol Sobczakkarol.sobczak@teradata.com

    http://github.com/teradata/benchtohttp://github.com/teradata/benchto

  • Goals Easy and manageable way to define benchmarks Run and analyze macro benchmarks in clustered environment Repeatable benchmarking of Hadoop SQL engines, most importantly Presto

    also used for Hive, Teradata components

    Transparent, trusted framework for benchmarking

    https://prestodb.io

  • Benchmarks - model

    BenchmarkRun QueryExecution

    Measurement AggregatedMeasurement Measurement

    n n

    1

    n

    1

    n

  • Benchmarks - executionbefore-benchmark-macros

    prewarm

    benchmark

    .

    .

    execution-0

    execution-1

    execution-n

    after-benchmark-macros

  • Benchmarks - executionbefore-benchmark-macros

    prewarm

    benchmark

    .

    .

    execution-0

    execution-1

    execution-n

    after-benchmark-macros

  • Benchmarks - executionbefore-benchmark-macros

    prewarm

    benchmark

    .

    .

    execution-0

    execution-1

    execution-n

    after-benchmark-macros

  • Benchmarks - executionbefore-benchmark-macros

    prewarm

    benchmark

    .

    .

    execution-0

    execution-1

    execution-n

    after-benchmark-macros

  • Defining benchmarks - structure Convention based defining of benchmark through descriptors (YAML format)

    and query SQL files$ tree .. application-presto-devenv.yaml application-td-hdp.yaml benchmarks presto concurrency-insert-multi-table.yaml concurrency.yaml linear-scan.yaml tpch.yaml types.yaml querygrid-presto-ansi concurrency.yaml sql

    presto dev-zero create-alltypes.sql create-lineitem.sql linear-scan selectivity-0.sql selectivity-100.sql

    ...

  • Defining benchmarks - descriptor Descriptor is YAML configuration file with various properties and user defined

    variables$ cat benchmarks/presto/concurrency.yamldatasource: prestoquery-names: presto/linear-scan/selectivity-${selectivity}.sqlschema: tpch_100gb_orcdatabase: hiveconcurrency: ${concurrency_level}runs: ${concurrency_level}prewarm-runs: 3before-benchmark: drop-cachesvariables: 1:

    selectivity: 10, 100concurrency_level: 10

    2:selectivity: 10, 100concurrency_level: 20

    3:selectivity: 10, 100concurrency_level: 50

  • Defining benchmarks SQL file templating SQL files can use keys defined in YAML configuration file templates are

    based on FreeMarker$ cat sql/presto/tpch/q14.sqlSELECT 100.00 * sum(CASE WHEN p.type LIKE 'PROMO%' THEN l.extendedprice * (1 - l.discount) ELSE 0 END) / sum(l.extendedprice * (1 - l.discount)) AS promo_revenueFROM "${database}"."${schema}"."lineitem" AS l, "${database}"."${schema}"."part" AS pWHERE l.partkey = p.partkey AND l.shipdate >= DATE '1995-09-01' AND l.shipdate < DATE '1995-09-01' + INTERVAL '1' MONTH

  • Future work (Tempto) Support for complex concurrent tests execution (Benchto) Automatic regression detection (Benchto) Customized dashboards (e.g. overall performance analysis) (Benchto) Hardware and configuration awarness (Benchto) More complex benchmarking scenarios (Benchto) Support for complex concurrency scenarios (Benchto) Scheduling mechanism

  • Questions?

  • Benchto GUI Visualization of benchmarks results Linking between tools (Grafana, Presto UI) Comparison of multiple benchmarks

  • Grafana monitoring We use Grafana dashboard with Graphite Benchmark/executions life-cycle events are showed on dashboards Provides good visibility into state of the cluster