Understanding Qubole Hive Authorization

Hive authorization is one of the methods to authorize users for various accesses and privileges. Qubole provides SQL Standard-based authorization with some additional controls and differences from the open source. See SQL Standard Based Hive Authorization for more information.

Qubole’s Hive authorization is aimed at providing Qubole Hive users the ability to control granular access to Hive tables and columns. It is also aimed at providing granular control over the type of privileges a Hive user can have over a Hive table.

Warning

Hive notebooks are in the beta phase. As there may be potential security concerns to use it in production, you can experiment a Hive notebook and cannot use it for a production usage.

Understanding Privileges for Users and Roles

Privileges are granted to users and user-roles. A user can be assigned more than one role. These are the default roles available in Hive:

  • public - By default, all users are assigned with the public role.

  • admin - Only a few users are assigned with admin roles with all privileges. An admin can assign/unassign the admin role to a user.

    An admin can:

    • Create a role
    • Drop a role
    • Show roles
    • Show Principals
    • Use dfs, add, delete, compile, and reset commands. However, Qubole Hive authorization allows a user to add and delete commands, which is a variation from open source Hive. See Differences from the Open Source Hive for more information.
    • Add or drop functions and macros

When you run a Hive query command, Qubole Hive checks the privileges granted to you with the current role.

Required Privileges for Performing Hive Operations

These are the required privileges for performing Hive operations:

  • SELECT privilege: It provides read access to an object (table).
  • INSERT privilege: It provides ability for adding data to an object (table).
  • UPDATE privilege: It provides ability for running UPDATE queries on an object (table).
  • DELETE privilege: It provides ability for deleting data in an object (table).
  • ALL privilege: It provides all privileges. In other words, this privilege gets translated into all the above privileges.

Enabling Qubole Hive Authorization

Hive Authorization is not enabled by default. To enable Hive Authorization in a QDS account, create a Qubole support ticket.

Using Qubole Hive Authorization describes how to use the Qubole Hive authorization.

Note

Once Qubole has enabled Hive Authorization in your account:

  • QDS sets hive.security.authorization.enabled to true, and adds it to Hive’s Restricted List. This prevents users from bypassing Hive authorization when they run a query.
  • If you want to change the setting of hive.security.authorization.enabled at the cluster level, you can do so in the QDS UI: set it in the Override Hive Configuration field in the Hive Settings section under the Advanced Configuration tab of a Hadoop (Hive) cluster, then restart the cluster.
  • To change the setting at the account level, create a Qubole support ticket.

Differences from the Open Source Hive

Qubole Hive Authorization has the following differences from the open source Hive:

  • Qubole has enabled add/delete commands to users unlike in the open source Hive, where commands such as dfs, add, delete, compile, and reset are disabled.

  • Qubole has disabled filesystem-level checks. Open source Hive does filesystem-level checks to see if the user has READ, WRITE, and OWNERSHIP of the location hierarchy.

    Qubole has disabled the filesystem-level check for Cloud Object Storage due to following reasons:

    • The permissions does not translate well into READ, WRITE, or OWNERSHIP in case of Cloud Object Storage as they do for HDFS.
    • The permission checks occur for the entire location hierarchy. So, for a directory, Hive checks each file in that directory recursively for permissions. If a directory does not exist, then Hive recursively checks for permissions by going a level up until it reaches a directory. This type of Hive behavior in Cloud Object Storage would mean a lot of Cloud Object Storage calls leading to a huge command latency. By default, hive.authz.disable.fs.check is set to true. To revert to the open source Hive behavior, set hive.authz.disable.fs.check to false.

Known Issues in the Qubole Hive Authorization

The following are the known issues in the Qubole Hive Authorization in Qubole Hive 2.1:

  • Explain Queries do not check for the SELECT privilege.
  • Grant role with the admin option is not working with the IAM Roles authorization.