Submit a Hive Command

POST /api/v1.2/commands/

This API is used to submit a Hive query.

Required Role

The following users can make this API call:

  • Users who belong to the system-admin or system-user group.
  • Users who belong to a group associated with a role that allows submitting a command. See Managing Groups and Managing Roles for more information.

Parameters

Note

Parameters marked in bold below are mandatory. Others are optional and have default values.

Parameter Description
query Specify Hive query to run. Either query or script_location is required.
script_location Specify a Google Cloud Storage path where the hive query to run is stored. Either query or script_location is required.
command_type HiveCommand
label Specify the cluster label on which this command is to be run.
retry Denotes the number of retries for a job. Valid values of retry are 1, 2, and 3.
retry_delay Denotes the time interval between the retries when a job fails.
macros Expressions to evaluate macros used in the hive command. Refer to Macros in Scheduler for more details.
sample_size Size of sample in bytes on which to run the query for test mode.
maximum_progress Value of progress for constrained run. The valid float value is between 0 and 1.
maximum_run_time Constrained run maximum runtime in seconds
minimum_run_time Constrained run minimum runtime in seconds
approx_aggregations Convert count distinct to count approx. Valid values are bool or NULL
name Add a name to the command that is useful while filtering commands from the command history. It does not accept & (ampersand), < (lesser than), > (greater than), ” (double quotes), and ‘ (single quote) special characters, and HTML tags as well. It can contain a maximum of 255 characters.
pool Use this parameter to specify the Fairscheduler pool name for the command to use.
tags Add a tag to a command so that it is easily identifiable and searchable from the commands list in the Commands History. Add a tag as a filter value while searching commands. It can contain a maximum of 255 characters. A comma-separated list of tags can be associated with a single command. While adding a tag value, enclose it in square brackets. For example, {"tags":["<tag-value>"]}.
macros Denotes the macros that are valid assignment statements containing the variables and its expression as: macros: [{"<variable>":<variable-expression>}, {..}]. You can add more than one variable. For more information, see Macros.
timeout It is a timeout for command execution that you can set in seconds. Its default value is 129600 seconds (36 hours). QDS checks the timeout for a command every 60 seconds. If the timeout is set for 80 seconds, the command gets killed in the next minute that is after 120 seconds. By setting this parameter, you can avoid the command from running for 36 hours.

Note

Log for a particular Hive query is available at <Default location>/cluster_inst_id/<cmd_id>.log.gz.

Examples

Goal: Show tables

curl  -i -X POST -H "X-AUTH-TOKEN: $AUTH_TOKEN" -H "Content-Type: application/json" -H "Accept: application/json" \
-d '{
      "query":"show tables;", "command_type": "HiveCommand"
    }' \
"https://gcp.qubole.com/api/v1.2/commands"

Response:

HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8

 {
   "command": {
     "approx_mode": false,
     "approx_aggregations": false,
     "query": "show tables",
     "sample": false
   },
   "qbol_session_id": 0000,
   "created_at": "2012-10-11T16:01:09Z",
   "user_id": 00,
   "status": "waiting",
   "command_type": "HiveCommand",
   "id": 3850,
   "progress": 0,
   "meta_data": {
     "results_resource": "commands\/3850\/results",
     "logs_resource": "commands\/3850\/logs"
   }
 }

Goal: Create an External Table from data on S3

export QUERY="create external table miniwikistats (projcode string, pagename string, pageviews int, bytes int) partitioned by(dt string) row format delimited fields terminated by \t lines terminated by \n location s3n://paid-qubole/default-datasets/miniwikistats/"

curl -X POST -H "X-AUTH-TOKEN:$AUTH_TOKEN" -H "Content-Type: application/json" -H "Accept: application/json" \
-d '{
      "query":"$QUERY", "command_type":"HiveCommand"
   }'\
"https://gcp.qubole.com/api/v1.2/commands/"

Response:

HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8

 {
   "command": {
     "approx_mode": false,
     "approx_aggregations": false,
     "query": "create external table miniwikistats (projcode string, pagename string, pageviews int, bytes i) partitioned by(dt string) row format delimited fields terminated by ' ' lines terminated by '\n' location 's3n:\/\/paid-qubole\/default-datasets\/miniwikistats\/'",
     "sample": false
   },
   "qbol_session_id": 0000,
   "created_at": "2012-10-11T16:44:53Z",
   "user_id": 00,
   "status": "error",
   "command_type": "HiveCommand",
   "id": 3851,
   "progress": 100,
   "meta_data": {
     "results_resource": "commands\/3851\/results",
     "logs_resource": "commands\/3851\/logs"
   }
 }

Goal: Count the number of rows in the table

export QUERY="select count(*) as num_rows from miniwikistats;"

curl -X POST -H "X-AUTH-TOKEN: $AUTH_TOKEN" -H "Content-Type: application/json" -H "Accept: application/json" \
-d '{
     "query":"$QUERY", "command_type": "HiveCommand"
    }' \
"https://gcp.qubole.com/api/v1.2/commands/"

Response:

HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8

 {
   "command": {
     "approx_mode": false,
     "approx_aggregations": false,
     "query": "select count(*) as num_rows from miniwikistats;",
     "sample": false
   },
   "qbol_session_id": 0000,
   "created_at": "2012-10-11T16:54:57Z",
   "user_id": 00,
   "status": "waiting",
   "command_type": "HiveCommand",
   "id": 3852,
   "progress": 0,
   "meta_data": {
     "results_resource": "commands\/3852\/results",
     "logs_resource": "commands\/3852\/logs"
   }
 }

Goal: Run a query stored in a S3 file location

Contents of file in S3

select count(*) from miniwikistats

Payload

{
  "script_location":"<S3 Path>", "command_type": "HiveCommand"
}

Request

curl -X POST -H "X-AUTH-TOKEN: $AUTH_TOKEN"  -H "Content-Type: application/json" -H "Accept: application/json" \
-d @payload "https://gcp.qubole.com/api/v1.2/commands/"

Goal: Run a parameterized query stored in a S3 file location

Contents of file in S3

select count(*) from miniwikistats where dt = '$formatted_date$'

Payload

{
    "script_location":"<S3 Path>",
    "macros":[{"date":"moment('2011-01-11T00:00:00+00:00')"},{"formatted_date":"date.clone().format('YYYY-MM-DD')"}],
    "command_type": "HiveCommand"
}

Request

curl -X POST -H "X-AUTH-TOKEN: $AUTH_TOKEN"  -H "Content-Type: application/json" -H "Accept: application/json" \
-d @payload "https://gcp.qubole.com/api/v1.2/commands/"

Take a note of the query ID (in this case 3852). It is used in later examples.

export QUERYID=3852

Goal: Submitting a Hive Query to a Specific Cluster

curl  -X POST -H "X-AUTH-TOKEN: $AUTH_TOKEN" -H "Content-Type: application/json" -H "Accept: application/json" \
-d '{"query":"show tables;", "label":"HadoopCluster", "command_type": "HiveCommand"}' \ "https://gcp.qubole.com/api/v1.2/commands"