Presto

New Features and Enhancements

  • PRES-2372: Cost-based optimization (CBO) for JOIN reordering and JOIN distribution type selection, using statistics in the Hive metastore, is enabled by default for Presto version 0.208.

    The following values have been added to the default cluster configuration for Qubole Presto version 0.208.

    optimizer.join-reordering-strategy=AUTOMATIC
    join-distribution-type=AUTOMATIC
    join-max-broadcast-table-size=100MB
    
  • PRES-2695: QDS allows you to override the required number of workers feature’s cluster-level properties, query-manager.required-workers-max-wait and query-manager.required-workers at the query level using the corresponding session-level properties required_workers_max_wait and required_workers.

  • PRES-2918: A new experimental configuration property experimental.reserved-pool-enabled has been added to Presto version 0.208 to allow you to disable the Reserved Pool. The Reserved Pool prevents deadlocks when memory is exhausted in the General Pool; the largest query is promoted to the to Reserved Pool. But only one query is promoted and the remaining queries in the General Pool are blocked state whenever the pool is full. To avoid this, you can set experimental.reserved-pool-enabled to false thereby disabling the Reserved Pool. For more information, see Disabling Reserved Pool.

  • PRES-3001: Qubole proactively replaces existing preemptible VMs with new ones before the interruption time of 24 hours, thereby preventing the duration limit from having an adverse effect on running queries.

  • PRES-2657: The path for spill-to-disk functionality, experimental.spiller-spill-path=/media/ephemeral0/presto/spill_dir, has been configured by default in Qubole Presto 0.208. This allows you to use spill-to-disk easily, either by setting set session spill_enabled=true for individual queries, or adding experimental.spill-enabled=true to the Presto cluster configuration override to enable spill-to-disk for all queries.

  • PRES-111: Added call hive.default.clear_cache() procedure call to clear stale Hive metastore caches. Useful when metastore updates might have occurred from outside the Presto cluster. The command is supported only on Presto version 0.208.

  • PRES-2744: New session property qubole_max_raw_input_datasize=1TB limits the total bytes scanned. Queries that exceed this limit fail with the RAW_INPUT_DATASIZE_READ_LIMIT_EXCEEDED exception. This ensures rogue queries do not run for a very long time.

  • PRES-2790: Performance improvement in queries involving IN and NOT IN over a subquery. See this blog post.

  • PRES-2605 Added a new scheduler to optimally schedule tasks according to where Rubix caches the data. See `https://www.qubole.com/blog/presto-rubix-scheduler-improves-cache-reads/`__.

  • PRES-2584: Improved smart query retry to support INSERT OVERWRITE TABLE, CREATE TABLE AS and SELECT queries which failed without returning any data. Tracking of query retries has been improved in command logs with Query Tracker links for retries.

  • JDBC-124: QDS now supports concurrent multiple statements in Presto FastPath.

  • PRES-2510: Choosing the Presto UI from the QDS Control Panel redirects to <base-url>/presto-ui-<cluster-id>/ui/. It also redirects <coordinator>:dns:8081 to a static resource <base-url>/ui/index.html.

  • PRES-2992: Added presto-tpcds, presto-localfile, and presto-thrift connectors to Presto 0.193 and 0.208 versions.

  • PRES-2924: Engineering updates have been made to support Presto on GCP as a beta offering in R57. Presto on GCP (beta) supports most Qubole Data Service (QDS) features except Presto Notebooks and Big Query Storage Connector.

Bug Fixes

  • PRES-2568: Fixes a problem that caused a carriage return \r to be incorrectly added wherever there was a semicolon in a query.
  • PRES-2810: Fixes a problem that caused failures in query planning when dynamic filtering is enabled.