Introduction to Qubole Clusters

Qubole Data Service (QDS) provides a unified platform for managing different types of compute clusters.

QDS can run queries and programs written with tools such as SQL, MapReduce, Scala, and Python. These run on distributed execution frameworks such as Hadoop and Spark, on multi-node clusters comprising one coordinator node and one or more worker nodes.

Cluster Basics

Each QDS account has pre-configured clusters of different Types (Hadoop, Spark, etc.) You can configure additional clusters. Each cluster can have one or more unique Cluster Labels.

A new account is pre-configured with one cluster of each of the following types:

  1. Spark (labelled as spark)
  2. Hadoop 2 (labelled as hadoop2)
  3. Presto (labelled as presto; currently AWS and Azure only)

Navigate to Control Panel > Clusters in The QDS UI to see the list of clusters.

Note

The clusters are configured but are not active. A red status icon indicates that a cluster is down.

You can configure several clusters of a single cluster type as needed. (Trial accounts are limited to four clusters.)

Cluster Life Cycle Management

See Understanding the QDS Cluster Lifecycle.

Cluster Labels and Command Routing

You must assign at least one unique label to each QDS cluster; you can assign more than one label. Each new QDS account has a default Hadoop cluster with the label default.

Qubole commands are routed to clusters using these rules:

  • If a command includes a cluster label, the command is routed to the cluster with the corresponding label.
  • If no cluster label is included, the command is routed to the first matching cluster; for example:
    • Hive and Hadoop commands are routed to the first matching Hadoop cluster.
    • Spark commands are routed to the first matching Spark cluster.
    • Presto commands are routed to the first matching Presto cluster.