Hive-JDBC Connector¶
QDS provides a Hive connector for JDBC, so you can run SQL queries to analyze data that resides in JDBC tables.
Optimizations such as Support for PredicatePushDown are also available. You can find sample queries and a POM file in Hive JDBC Storage Handler
Note
Qubole has deprecated its JDBC Storage handler. Use the open-source JDBC Storage handler. THE ADD JAR
statement is mandatory only for Hive versions 1.2. and 2.1.1 as Hive versions 2.3 and 3.1.1 (beta) contain the
required jars.
Adding Required Jars¶
add jar gs://paid-qubole/jars/jdbchandler/mysql-connector-java-5.1.34-bin.jar;
add jar gs://paid-qubole/jars/jdbchandler/qubole-hive-JDBC-0.0.7.jar;
Creating a Table¶
An external Hive table connecting a JDBC table can be created as follows, allowing read and write to an underlying JDBC table.
Example
The table can be created in two ways:
You can explicitly give column mappings along with the table creation statement.
DROP TABLE HiveTable; CREATE EXTERNAL TABLE HiveTable( id INT, id_double DOUBLE, names STRING, test INT ) STORED BY 'org.apache.hadoop.hive.jdbc.storageHandler.JdbcStorageHandler' TBLPROPERTIES ( "mapred.jdbc.driver.class"="com.mysql.jdbc.Driver", "mapred.jdbc.url"="jdbc:mysql://localhost:3306/rstore", "mapred.jdbc.username"="-----", "mapred.jdbc.input.table.name"="JDBCTable", "mapred.jdbc.output.table.name"="JDBCTable", "mapred.jdbc.password"="------" );
You can specify no table mappings; the SerDe class automatically generates the mappings.
CREATE EXTERNAL TABLE HiveTable row format serde 'org.apache.hadoop.hive.jdbc.storagehandler.JdbcSerDe' STORED BY 'org.apache.hadoop.hive.jdbc.storagehandler.JdbcStorageHandler' TBLPROPERTIES ( "mapred.jdbc.driver.class"="com.mysql.jdbc.Driver", "mapred.jdbc.url"="jdbc:mysql://localhost:3306/rstore", "mapred.jdbc.username"="root", "mapred.jdbc.input.table.name"="JDBCTable", "mapred.jdbc.output.table.name" = "JDBCTable", "mapred.jdbc.password"="" );
Usage¶
The Hive-JDBC connector supports almost all types of SQL queries. Some examples of supported queries are:
Reading Data
> select * from HiveTable;
> select count(*) from HiveTable;
> select id from HiveTable where id > 50000;
> select names from HiveTable;
> select * from HiveTable where names like ‘D%’;
> select * FROM HiveTable ORDER BY id DESC;
Joining Tables
> select HiveTable_1.*, HiveTable_2.* from HiveTable_1 a join HiveTable_2 b
on (a.id = b.id) where a.id > 90000 and b.id > 97000;
Writing Data
Note
Writing data holds good to Qubole-on-Azure/OCI/GCP until Qubole Hive JDBC Storage Handler is deprecated.
> Insert Into Table HiveTable_1 select * from HiveTable_2;
> Insert Into Table HiveTable_1 select * from HiveTable_2 where id > 50;
Group By Queries
> select id, sum(id_double) as sum_double from HiveTable group by id;
Support for PredicatePushDown¶
To enable/disable PredicatePushDown, add the following configuration.
set hive.optimize.ppd = true/false
Handling Unsuccessful Tez Queries While Querying JDBC Tables¶
Note
This holds good to Qubole-on-Azure/OCI/GCP until Qubole Hive JDBC Storage Handler is deprecated.
In the Hive JDBC connector, the JDBC Storage handler does not work when Input Splits Grouping is enabled in Hive-on-Tez.
As a result, the following exception message is displayed.
java.io.IOException: InputFormatWrapper can not support RecordReaders that don't return same key & value objects.
HiveInputFormat
is enabled by default in Tez to support Splits Grouping.
You can avoid the issue by setting the input format as CombineHiveInputFormat
using this command that disables the
Splits Grouping.
set hive.tez.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;