Canonical Hive Commands Report

GET /api/v1.2/reports/canonical/_hive/_commands/

This API provides you the canonical Hive commands report in JSON format. Currently, this report is not generated and Qubole intends to provide this report very soon.

Note

The following points are related to a report API:

  • If the difference between start date and end date is more than 60 days, then the system defaults to 1 month window from the current day’s date.
  • If either start date or end date is not provided, then the system defaults to 1 month window from the current day’s date.
  • If you want to get data for a window more than 2 months, then write an email to help@quoble.com.

Required Role

The following users can make this API call:

  • Users who belong to the system-admin or system-user group.
  • Users who belong to a group associated with a role that allows viewing canonical Hive commands reports. See Managing Groups and Managing Roles for more information.

Parameters

Parameter Description
start_date The date from which you want the report (inclusive). This parameter supports the timestamp in the UTC timezone (YYYY-MM-DDTHH:MM:SSZ) format. The date cannot be earlier than 90 days.
end_date The date until which you want the report. The report contains data from this date also. The API default is today or now. This parameter also supports timestamp in the UTC timezone (YYYY-MM-DDTHH:MM:SSZ) format.
offset The starting point of the results. The API default is 0.
limit The number of results to fetch. The API default is 10.
sort_column The column used to sort the report. Since this report returns the top canonical_hive_commands, the sort order is always descending. Valid choices are ‘frequency’, ‘cpu’, ‘fs_bytes_read’, and ‘fs_bytes_written’. The API default is, frequency.
show_ast Also return the serialized AST corresponding to the canonical query. (Not returned by default.)

Response Parameters

Parameter Description
start_date The actual starting date of the report.
end_date The actual ending date of the report.
sort_column The sort column used.

An array of:

canonical_query_id The ID of the canonical query.
canonical_query The AST dump of the canonical query. (This is returned only when the show_ast parameter is passed.)
recent_example The most recent example of this type of queries.
frequency The number of queries of this type.
cpu The total cumulative CPU, (in ms), consumed by these queries.
fs_bytes_read The total bytes read by these queries.
fs_bytes_written The total bytes written by these queries.

Examples

Without any parameters

curl -i -X GET -H "X-AUTH-TOKEN: $AUTH_TOKEN" -H "Accept: application/json" \
"https://gcp.qubole.com/api/v1.2/reports/canonical_hive_commands"

Sample Response

{
  "sort_column": "frequency",
  "canonical_queries": [
    {
      "canonical_query_id": "af09cd5799e52f450a87e236f453b864833afac97603409a17f3df4d010b1814",
      "fs_bytes_written": 0,
      "fs_bytes_read": 0,
      "frequency": 8800,
      "cpu": 0,
      "recent_example": "alter table demo_memetracker recover partitions"
    },
    {
      "canonical_query_id": "9548ac7ec7defe3c3251da2544ec545c9bd578cb308c6c3c1936e48df0bdfdb4",
      "fs_bytes_written": 0,
      "fs_bytes_read": 0,
      "frequency": 4547,
      "cpu": 0,
      "recent_example": "alter table daily_tick_data recover partitions"
    },
    {
      "canonical_query_id": "69dae07fc876927b9daba6279c962bc343131c08d8f9f98adfa0c05ef90b40a4",
      "fs_bytes_written": 0,
      "fs_bytes_read": 0,
      "frequency": 768,
      "cpu": 0,
      "recent_example": "show tables"
    },
    {
      "canonical_query_id": "89720fe23d2a85ac217a3b230e992c45dd523b65e6d45863cc410f4b5e4795ea",
      "fs_bytes_written": 0,
      "fs_bytes_read": 0,
      "frequency": 53,
      "cpu": 0,
      "recent_example": "select * from `default`.`memetracker` limit 400"
    },
    {
      "canonical_query_id": "04bccd848172842c8fadd687aef72ac2161f72895dfd3c1d3c31a96411d34095",
      "fs_bytes_written": 0,
      "fs_bytes_read": 0,
      "frequency": 49,
      "cpu": 0,
      "recent_example": "select * from `default`.`30days_test` limit 1 "
    },
    {
      "canonical_query_id": "9996d665cf077f60b1ee87d3b5b80cd65ce078f77935096441433628909b9ddb",
      "fs_bytes_written": 9,
      "fs_bytes_read": 28482500000,
      "frequency": 48,
      "cpu": 0,
      "recent_example": "select count(*) from memetracker"
    },
    {
      "canonical_query_id": "492ea35ff3d58d0d07e70bcc68ea33eadb7bb572f6fdf14f7220931cc94b1abc",
      "fs_bytes_written": 0,
      "fs_bytes_read": 0,
      "frequency": 37,
      "cpu": 0,
      "recent_example": "select * from `default`.`default_qubole_airline_origin_destination` limit 1000"
    },
    {
      "canonical_query_id": "5e2bb3326c4cf2c5b669d60729c5697fb6fdef4f1e09be41e37f424eb96b0c74",
      "fs_bytes_written": 0,
      "fs_bytes_read": 0,
      "frequency": 30,
      "cpu": 0,
      "recent_example": "select * from default_qubole_memetracker limit 10;"
    },
    {
      "canonical_query_id": "aa1c7a294f6e18feec68175a643814f06e180f4ac5e62eb6b556d9bf72830bc2",
      "fs_bytes_written": 25326,
      "fs_bytes_read": 85944,
      "frequency": 22,
      "cpu": 0,
      "recent_example": "select * from test_csv limit 5;"
    },
    {
      "canonical_query_id": "2145af0ee70e1cd93c9901cd41dee8285faacaefe86e4a4f22880316cc4e63c3",
      "fs_bytes_written": 21050200,
      "fs_bytes_read": 321138000,
      "frequency": 21,
      "cpu": 0,
      "recent_example": "select * from demo_memetracker limit 100"
    }
  ]
}

With a different sort column and limit and show_ast=true

curl -i -X GET -H "X-AUTH-TOKEN: $AUTH_TOKEN" -H "Accept: application/json" \ "https://gcp.qubole.com/api/v1.2/reports/canonical_hive_commands?sort_column=cpu&limit=2&show_ast=true"

Sample Response

{
 "sort_column": "cpu",
 "canonical_queries": [
    {
      "canonical_query_id": "d9635e3ad5501c9ad47bb728c35b63e1b41f8c2ba0fb4f7533b9ab701ce503c4",
      "canonical_query": "(null(TOK_QUERY(TOK_FROM(TOK_TABREF(TOK_TABNAME(lineitem))))(TOK_INSERT(TOK_DESTINATION(TOK_DIR(TOK_TMP_FILE)))(TOK_SELECT(TOK_SELEXPR(TOK_FUNCTIONSTAR(count))))))())",
      "fs_bytes_written": 18,
      "fs_bytes_read": 7726360000,
      "frequency": 2,
      "cpu": 423130,
      "recent_example": "set fs.s3.inputpathprocessor=false;\nselect count(*) from lineitem;"
    },
    {
      "canonical_query_id": "ea1a37cf6b4694293d15cadfaf4bbae2459f12475cd86ea90e6c4f8e31945bda",
      "canonical_query": "(null(TOK_QUERY(TOK_FROM(TOK_TABREF(TOK_TABNAME(default_qubole_memetracker))))(TOK_INSERT(TOK_DESTINATION(TOK_DIR(TOK_TMP_FILE)))(TOK_SELECT(TOK_SELEXPR(TOK_FUNCTIONSTAR(count))))(TOK_WHERE(=(TOK_TABLE_OR_COL(month))(LITERAL)))))())",
      "fs_bytes_written": 108,
      "fs_bytes_read": 54673600000,
      "frequency": 20,
      "cpu": 416200,
      "recent_example": "SELECT count(*) FROM default_qubole_memetracker where month = '2008-08';"
    }
 ]
}

For a specific time period

curl -i -X GET -H "X-AUTH-TOKEN: $AUTH_TOKEN" -H "Accept: application/json" \ "https://gcp.qubole.com/api/v1.2/reports/canonical_hive_commands?start_date=2014-04-01&end_date=2014-04-21&sort_column=fs_bytes_read&limit=2"

Sample Response

{
  "canonical_queries": [
     {
       "canonical_query_id": "55ebb0cc47e0dc74c70245b026126bba191969dee1dc380a6f98698e6b194085",
       "cpu": 75720,
       "frequency": 1,
       "recent_example": "select dt, count(*) from  junk_temp \ngroup by dt order by dt\n\n\n",
       "fs_bytes_read": 308582016,
       "fs_bytes_written": 1514
     },
     {
       "canonical_query_id": "1c421d2e65407650650cbc2ee80f9a59863875f52d1a9ddd5a051118678a3a6c",
       "cpu": 34980,
       "frequency": 1,
       "recent_example": "select created_at from  junk_temp \nwhere dt=2014-01-03 limit 10\n\n",
       "fs_bytes_read": 308582016,
       "fs_bytes_written": 0
     }
  ],
   "start_date": "2014-03-31T10:00:00Z",
   "end_date": "2014-04-21T20:00:00Z",
   "sort_column": "fs_bytes_read"
}

To learn more about canonicalization of hive queries, see the Blog post.

Caution

Qubole started collecting the CPU metrics only from the last week of December, 2013. So, if you have some queries before that, the CPU metrics is considered as 0.