Create a Cluster on Google Cloud Platform

POST /api/v2/clusters/

Use this API to create a new cluster when you are using Qubole on GCP. You create a cluster for a workload that has to run in parallel with your pre-existing workloads.

You might want to run workloads across different geographical locations or there could be other reasons for creating a new cluster.

Required Role

The following users can make this API call:

  • Users who belong to the system-admin or system-user group.
  • Users who belong to a group associated with a role that allows creating a cluster. See Managing Groups and Managing Roles for more information.

Parameters

Note

Parameters marked in bold below are mandatory. Others are optional and have default values.

Parameter Description
cloud_config A list of labels that identify the cluster. At least one label must be provided when creating a cluster.
cluster_info Contains the configurations of a cluster.
engine_config Contains the configurations of the type of clusters
monitoring Contains the cluster monitoring configuration.
internal Contains the security settings for the cluster.

cloud_config

Parameter Description
compute_config Defines the GCP account compute credentials for the cluster.
storage_config Defines the GCP account storage credentials for the cluster.
location Sets the GCP geographical location.
network_config Defines the network configuration for the cluster.
cluster_composition Defines the mixture of on-demand instances and preemptible instances for the cluster.

compute_config

Parameter Description
use_account_compute_creds Determines whether to use account compute credentials. By default, it is set to false. Set it to true to use account compute credentials.
customer_project_id The project ID, unique across GCP.

storage_config

Parameter Description
customer_project_id The project ID, unique across GCP.
disk_type  
disk_size_in_gb  
disk_count  
disk_upscaling_config  

location

Parameter Description
region A Google-defined geographical location where you can run your GCP resources.
zone A subdivision of a GCP region, identified by letter a, b, c, etc.

network_config

Parameter Description
network The Google VPC network.
subnet The name of the subnet.
master_static_ip The static IP address to be attached to the cluster’s coordinator node.
bastion_node_public_dns The bastion host public DNS name if private subnet is provided for the cluster in a VPC. Do not specify this value for a public subnet.
bastion_node_port The port of the bastion node. The default value is 22. You can specify a non-default port if you want to access a cluster that is in a VPC with a private subnet.
bastion_node_user The bastion node user, which is ec2-user by default. You can a non-default user using this option.
master_elastic_ip The elastic IP address for attaching to the cluster coordinator. For more information, see this documentation.

cluster_composition

Parameter Description
master Whether the coordinator node is preemptible or not.
min_nodes Specifies what percentage of minimum required nodes can be preemptible instances.
autoscaling_nodes Specifies what percentage of autoscaling nodes can be preemptible instances.

cluster_info

Parameter Description
master_instance_type Defines the coordinator node type.
slave_instance_type Defines the worker node type.
node_base_cooldown_period

With the aggressive downscaling feature enabled on the QDS account, this is the cool down period set in minutes for nodes on a Hadoop 2 or Spark cluster. The default value is 15 minutes.

Note

The aggressive downscaling feature is only available on request.

node_volatile_cooldown_period

With the aggressive downscaling feature enabled on the QDS account, this is the cool down period set in minutes for preemptible nodes on a Hadoop 2 or Spark cluster. The default value is 15 minutes. The default value is 15 minutes.

Note

The aggressive downscaling feature is only available on request.

label Label for the cluster.
min_nodes The minimum number of worker nodes. The default value is 1.
max_nodes The maximum number of nodes up to which the cluster can be autoscaled. The default value is 2.
idle_cluster_timeout_in_secs

After enabling the aggressive downscaling feature on the QDS account, the Cluster Idle Timeout can be configured in seconds. Its minimum configurable value is 300 seconds and the default value would still remain 2 hours (that is 120 minutes or 7200 seconds).

Note

This feature is only available on a request. Create a ticket with Qubole Support to enable this feature on the QDS account.

cluster_name The name of the cluster.
node_bootstrap A file that is executed on every node of the cluster at boot time. Use this to customize the cluster nodes by setting up environment variables, installing the required packages, and so on. The default value is, node_bootstrap.sh.
disallow_cluster_termination Prevent auto-termination of the cluster after a prolonged period of disuse. The default value is, false.
force_tunnel  
customer_ssh_key SSH key to use to login to the instances. The default value is none. (Note: This parameter is not visible to non-admin users.) The SSH key must be in the OpenSSH format and not in the PEM/PKCS format.
env_settings  
datadisk  
root_volume_size Defines the size of the root volume of cluster instances. The supported range for the root volume size is 90 - 2047. An example usage would be "rootdisk" => {"size" => 500}.

engine_config

Parameter Description
flavour Denotes the type of cluster. The supported values are: hadoop2 and spark.
hadoop_settings To change the coordinator node type from the default, select a different type from the drop-down list.
hive_settings Enter the minimum number of worker nodes if you want to change it (the default is 1).

hadoop_settings

Parameter Description
custom_hadoop_config The custom Hadoop configuration overrides. The default value is blank.
use_qubole_placement_policy Use Qubole Block Placement policy for clusters with preemptible nodes.
is_ha  
fairscheduler_settings The fair scheduler configuration options.

hive_settings

Parameter Description
hive_version Set to 2.1.1.
pig_version The default version of Pig is 0.11. Pig 0.15 and Pig 0.17 (beta) are the other supported versions. Pig 0.17 (beta) is only supported with Hive 2.1.1.
pig_execution_engine  
overrides The custom configuration overrides. The default value is blank.
is_metadata_cache_enabled  
execution_engine  

monitoring

Parameter Description
ganglia Whether to enable Ganglia monitoring for the cluster. The default value is, false.
datadog  

airflow_settings

The following table contains engine_config for an Airflow cluster.

Note

Parameters marked in bold below are mandatory. Others are optional and have default values.

Parameter Description
dbtap_id ID of the data store inside QDS. Set it to -1 if you are using the local MySQL instance as the data store.
fernet_key Encryption key for sensitive information inside airflow database. For example, user passwords and connections. It must be a 32 url-safe base64 encoded bytes.
type Engine type. It is airflow for an Airflow cluster.
version The default version is 1.10.0 (stable version). The other supported stable versions are 1.8.2 and 1.10.2. All the Airflow versions are compatible with MySQL 5.6 or higher.
airflow_python_version Supported versions are 3.5 (supported using package management) and 2.7. To know more, see Configuring an Airflow Cluster.
overrides

Airflow configuration to override the default settings. Use the following syntax for overrides:

<section>.<property>=<value>\n<section>.<property>=<value>...

internal

Parameter Description
zeppelin_interpreter_mode The default mode is legacy. Set it to user mode if you want the user-level cluster-resource management on notebooks. See Configuring a Spark Notebook for more information.
image_uri_overrides  
spark_s3_package_name  
zeppelin_s3_package_name  

Request API Syntax

curl -X POST -H "X-AUTH-TOKEN:$X_AUTH_TOKEN" -H "Content-Type:application/json" -H "Accept: application/json" \
-d '{
     "cloud_config": {
             "compute_config": {
                     "use_account_compute_creds": true,
                     "customer_project_id": "dev-acm-cust-project-1"
             },
             "storage_config": {
                     "customer_project_id": "dev-acm-cust-project-1",
                     "disk_type": null,
                     "disk_size_in_gb": 100,
                     "disk_count": 0,
                     "disk_upscaling_config": null
             },
             "location": {
                     "region": "us-east1",
                     "zone": "us-east1-b"
             },
             "network_config": {
                     "network": "projects/dev-acm-cust-project-1/global/networks/default",
                     "subnet": "projects/dev-acm-cust-project-1/regions/us-east1/subnetworks/default",
                     "master_static_ip": null,
                     "bastion_node_public_dns": null,
                     "bastion_node_port": null,
                     "bastion_node_user": null,
                     "master_elastic_ip": null
             },
             "cluster_composition": {
                     "master": {
                             "preemptible": false
                     },
                     "min_nodes": {
                             "preemptible": false,
                             "percentage": 0
                     },
                     "autoscaling_nodes": {
                             "preemptible": true,
                             "percentage": 50
                     }
             }
     },
     "cluster_info": {
             "master_instance_type": "n1-standard-4",
             "slave_instance_type": "n1-standard-4",
             "node_base_cooldown_period": null,
             "label": ["gcp-cluster-2"],
             "min_nodes": 1,
             "max_nodes": 1,
             "idle_cluster_timeout_in_secs": null,
             "cluster_name": "gcpqbol_acc44_cl176",
             "node_bootstrap": "node_bootstrap.sh",
             "disallow_cluster_termination": false,
             "force_tunnel": false,
             "customer_ssh_key": null,
             "child_hs2_cluster_id": null,
             "parent_cluster_id": null,
             "env_settings": {},
             "datadisk": {
                     "encryption": false
             },
             "slave_request_type": "ondemand",
             "spot_settings": {}
     },
     "engine_config": {
             "flavour": "hadoop2",
             "hadoop_settings": {
                     "custom_hadoop_config": null,
                     "use_qubole_placement_policy": true,
                     "is_ha": null,
                     "fairscheduler_settings": {
                             "default_pool": null
                     }
             },
             "hive_settings": {
                     "is_hs2": false,
                     "hive_version": "2.1.1",
                     "pig_version": "0.11",
                     "pig_execution_engine": "mr",
                     "overrides": null,
                     "is_metadata_cache_enabled": true,
                     "execution_engine": "tez",
                     "hs2_thrift_port": null
             }
     },
     "monitoring": {
             "ganglia": false,
             "datadog": {
                     "datadog_api_token": null,
                     "datadog_app_token": null
             }
     },
     "internal": {
             "zeppelin_interpreter_mode": null,
             "image_uri_overrides": null,
             "spark_s3_package_name": null,
             "zeppelin_s3_package_name": null
     }
 }' \ "https://gcp.qubole.com/api/v2/clusters"

Sample API Request

curl -X POST -H "X-AUTH-TOKEN:$X_AUTH_TOKEN" -H "Content-Type:application/json" -H "Accept: application/json"
-d '{
       "cloud_config" : {
         "provider" : "gcp"
         "compute_config" : {
                       "compute_validated": False,
                       "use_account_compute_creds": False,
                       "compute_client_id": "<your client ID>",
                       "compute_client_secret": "<your client secret key>",
                       "compute_tenant_id": "<your tenant ID>",
                       "compute_subscription_id": "<your subscription ID>"
                 },
         "location": {
                       "location": "centralus"
                 },
         "network_config" : {
                       "vnet_name" : "<vpc name>",
                           "subnet_name": "<subnet name>",
                           "vnet_resource_group_name": "<vnet resource group name>",
                           "persistent_security_groups": "<persistent security group>",
                 },
         "storage_config" : {
                       "storage_access_key": "<your storage access key>",
                       "storage_account_name": "<your storage account name>",
                       "disk_storage_account_name": "<your disk storage account name>",
                       "disk_storage_account_resource_group_name": "<your disk storage account resource group name>"
           "data_disk_count":4,
           "data_disk_size":300 GB
                 }
       },
       "cluster_info": {
            "master_instance_type": "Standard_A6",
            "slave_instance_type": "Standard_A6",
            "label": ["gcp"],
            "min_nodes": 1,
            "max_nodes": 2,
            "cluster_name": "GCP1",
            "node_bootstrap": "node_bootstrap.sh",
            },
       "engine_config": {
            "flavour": "hadoop2",
              "hadoop_settings": {
                  "custom_hadoop_config": "mapred.tasktracker.map.tasks.maximum=3",
              }
             },
       "monitoring": {
              "ganglia": true,
             }
       }' \ "https://gcp.qubole.com/api/v2/clusters"