Configuring a Presto Cluster

A single Qubole account can run multiple clusters. By default, Qubole provides a Presto cluster, along with Hadoop and Spark clusters, for each account.

The following topics explain Presto custom configuration and the presto catalog properties:

Note

QDS provides the Presto Ruby client for better overall performance, processing DDL queries much faster and quickly reporting errors that a Presto cluster generates. For more information, see this blog.

To view or edit a Presto cluster’s configuration, navigate to the Clusters page and select the cluster with the label presto.

Click the edit icon in the Action column against a Presto cluster to edit the configuration.

Note

Presto queries are memory-intensive. Choose instance types with ample memory for both the coordinator and worker nodes.

Presto versions 0.208 and 317 are the two supported stable versions.

See QDS Components: Supported Versions and Cloud Platforms for the latest version information.

Note

Qubole can automatically terminate a Presto cluster with an invalid configuration. This capability is available for Beta access; Create a ticket with Qubole Support to enable it for your account.

Check the logs in /usr/lib/presto/logs/server.log if there is a cluster failure or configuration error. See Presto FAQs for more information about Presto logs.

On AWS, Azure, or GCP, select Enable Rubix to enable RubiX. See Configuring RubiX in Presto and Spark Clusters for more information.

See Managing Clusters for more information on cluster configuration options that are common to all cluster types.

Avoiding Stale Caches

The cache parameters are useful to tweak if you expect data to change rapidly.

Fo example, if a Hive table adds a new partition, it may take Presto 20 minutes to discover it. If you plan on changing existing files in the Cloud, you may want to make fileinfo expiration more aggressive. If you expect new files to land in a partition rapidly, you may want to reduce or disable the dirinfo cache.