Spark

New Features

  • SPAR-3510: QDS now supports Apache Spark 2.4.3. It is displayed as 2.4 latest (2.4.3) in the Spark Version field of the Create New Cluster page in the QDS UI. All existing 2.4.0 clusters are automatically upgraded to 2.4.3 in accordance with Qubole Spark versioning policy.
  • SPAR-2937: You can configure Ranger policies for Hive tables, and these are honored by Spark SQL for authorization. Supported on Spark 2.4.0 and later versions. Beta, Via Support.

Enhancements

  • SPAR-3650: Spark computes the size of the input table during query planning, which speeds up queries containing joins by using BroadcastHashJoin. This is supported on Spark 2.4.0 and later versions. Via Support.
  • SPAR-3616: Allows Spark applications to run reliably even in Out-of-Memory cases. This capability can be enabled in Spark 2.4.3 and later versions. Via Support.
  • SPAR-3555: The appendToTable API now supports Hive tables as well as Spark data sources.
  • SPAR-3418: ORC metadata caching in Spark improves query performance by reducing the time spent on reading ORC metadata from an object store. This is supported on Spark 2.4.3 and later versions. Via Support.
  • SPAR-3226: Spark applications handle Spot Node Loss and Spot-blocks using YARN status of Graceful-Decommission. This is supported on Spark version 2.4.0 and later. Via Support.

Bug Fixes

  • SPAR-3730: The ClassNotFoundException error occurred due to the missing Rubix caching jars in the Hive Metastore classpath. With this fix, the Rubix caching jars are now available in the Hive Metastore classpath. This issue is fixed on Spark 2.2.0 and later versions.
  • SPAR-3701: Query run times in few TPCDS queries had increased due to filter pushdown in subqueries that disables subquery reuse. With this fix, the overall query run time is reduced whenever applicable.
  • SPAR-3405: Hive configs such as hive.metastore.uris were not reaching the Spark Hive Authorizer plugin when passed through Spark defaults or -–conf. As a result, connection errors occurred when connecting to the Hive Metastore and Hive Authorization was enabled. This issue is fixed in Spark 2.4.0 and later versions. Via Support.
  • SPAR-3766: During operations like update Table Stats the owner of the table was changed to the user running the command. With this fix, the original owner of the table is retained. This issue is fixed in Spark 2.4.0 and later versions.