Release Highlights

With R56, Qubole includes support for Google Cloud Platform (GCP).

The following are highlights of Qubole’s R56 release:

Analytics experience

  • Added error logs API for Presto commands.

GET /api/v1.2/commands/<Command-ID>/error_logs

Data engineering

Airflow

  • Latest Airflow 1.10 version is now fully supported with Python 3.5 package management.

Custom Metastore

  • You can now connect to any remote metastore server using the Thrift URI.

Support for multiple notification channels in Scheduler

  • You can now configure and send alerts to multiple end-points, such as Slack, PagerDuty, Email, other webhooks.

Data science

  • Notebook usability improvements for paragraphs such as active indicators, compact size to fit query, etc.

Administration and TCO

  • Added user email in commands API response to help admins check usage easily. GET /api/v1.2/commands/

Engines

  • Spark

    • Enhanced Broadcast Joins: Introducing Executor-based broadcast, where values to be broadcasted are not collected on the driver. This ensures that driver memory is not a bottleneck for using broadcast, enabling users with lower driver memory to do broadcast.
    • Cost Based Optimizer (CBO) Support: We enabled CBO support by default in Spark version 2.4.0 that uses table statistics to optimize the query for performance.
    • Hint-based Skew Join Support: Users can now specify hints for skewed columns and values for a join. Spark automatically allocates more resources to the skewed value based on the hint.
  • Spark Structured Streaming

    • Streaming State Store: New state storage management using RocksDB in Spark Structured Streaming. This is designed for better scalability, latency in stateful processing such as stream joins, and deduplication.
  • Hive

    • Added MySql 8.x support as Hive Metastore.