Using the Catalog Configuration¶

A Presto catalog consists of schemas and refers to a data source through a connector. Qubole allows you to add the catalog through a simplified way by just defining its properties through the Presto overrides on the Presto cluster. You can add the catalog using the syntax below through the Presto override.

catalog/<catalog-name>.properties:
<catalog property 1>
<catalog property 2>
.
.
.
<catalog property n>

catalog/hive.properties¶

Qubole provides table-level security for Hive tables accessed through Presto. See Understanding Qubole Hive Authorization for more information.

The following table describes the common Hive catalog properties.

Parameter	Examples	Default	Description	Supported Presto Version
hive.metastore-timeout	3m, 1h	3m	Timeout for Hive metastore calls that is, it denotes how long a request waits to fetch data from the metastore before getting timed out.	Presto versions 0.208 and 317
hive.metastore-cache-ttl	5m, 20m	20m	It denotes a data entry’s life duration in the metastore cache before it is evicted. Metastore caches tables, partitions, databases, and so on that are fetched from the Hive metastore. Configuring Thrift Metastore Server Interface for the Custom Metastore describes how to configure Hive Thrift Metastore Interface.	Presto versions 0.208 and 317
hive.metastore-cache-ttl-bulk	20m, 1d	NA	When you have a query that you need to run on `hive.information_schema.columns`, set this option as a Presto override. For example, `hive.metastore-cache-ttl-bulk=24h`. Enabling this option caches table entries for the configured duration, when the table info is fetched (in bulk) from the metastore. This makes fetching tables/columns through JBDC drivers faster. It is not supported in Presto version 317 and later.	Presto 0.208 version and older
hive.metastore-refresh-interval	10m, 20m	100m	It is the time interval set for refreshing metastore cache. After each interval expires, metastore cache is refreshed. So, in case if you see stale results for a query, then running the same query would fetch results without the stale data (assuming the time interval is Suppose, assume that you disable this parameter or set it by adding a value that is higher than that of expired). `hive.metastore-cache-ttl`. Try running the query after the entry is evicted from the metastore cache. The executed query brings back the evicted entry into the cache and the stale data is returned in the query. Retrieving info from the metastore takes more time than reading from the cache. You can avoid seeing stale results in the executed query by setting this parameter to a value that is lesser than `hive.metastore-cache-ttl`. If you run a query after the refresh interval’s expiry, then the query quickly returns the cached entry and starts a background cache refresh. So, to get cached entries with higher TTL and faster cache refreshes, set the value of `hive.metastore-cache-ttl` higher than `hive.metastore-refresh-interval`.	Presto versions 0.208 and 317
hive.security	`allow-all`, `sql-standard`	`allow-all`	`sql-standard` enables Hive authorization. See Understanding Qubole Hive Authorization for more information.	Presto versions 0.208 and 317
hive.skip-corrupt-records	`true`, `false`	`false`	It is used to skip corrupt records in input formats other than `orc`, `parquet` and `rcfile`. You can also set it as a session property, as `hive.skip_corrupt_records=true` in a session when the active cluster does not have this configuration globally enabled. This configuration is supported only in Presto 0.180 and later versions. Note The behavior for the corrupted file is non-deterministic that is Presto might read some part of the file before hitting corrupt data and in such a case, the QDS record reader returns whatever it read until this point and skips the rest of the file.	Presto versions 0.208 and 317
hive.information-schema-presto-view-only	`true`, `false`	`true`	It is enabled by default and hence, the information schema only includes the Presto views and not the Hive views. When it is set to `false`, the information schema includes both the Presto and Hive views.	Presto versions 0.208 and 317
hive.metastore.thrift.impersonation.enabled	`true`, `false`	`false`	It adds impersonation support for calls to the Hive metastore when you enable this property. It allows Presto to impersonate the user, who runs the query to access the Hive metastore.	Presto version 317
hive.max-partitions-per-scan	100000, 150000	100000	It is the maximum number of partitions for a single table scan.	Presto versions 0.208 and 317
hive.max-execution-partitions-per-scan	180000, 150000	The configured value of `hive.max-partitions-per-scan`.	You can use this property along with a relaxed limit on `hive.max-partitions-per-scan` when dynamic partition pruning is expected to reduce the number of partitions scanned at runtime. Note Using this runtime limit can cause Presto to scan data from `hive.max-execution-partitions-per-scan` partitions per table scan before it finds that it has breached the limit and fails the query.	Presto versions 0.208 and 317