Create a Cluster on Google Cloud Platform¶

POST /api/v2/clusters/¶

Use this API to create a new cluster when you are using Qubole on GCP. You create a cluster for a workload that has to run in parallel with your pre-existing workloads.

You might want to run workloads across different geographical locations or there could be other reasons for creating a new cluster.

Required Role¶

The following users can make this API call:

Users who belong to the system-admin or system-user group.
Users who belong to a group associated with a role that allows creating a cluster. See Managing Groups and Managing Roles for more information.

Parameters¶

Note

Parameters marked in bold below are mandatory. Others are optional and have default values.

Parameter	Description
cloud_config	A list of labels that identify the cluster. At least one label must be provided when creating a cluster.
cluster_info	Contains the configurations of a cluster.
engine_config	Contains the configurations of the type of clusters
monitoring	Contains the cluster monitoring configuration.
internal	Contains the security settings for the cluster.

cloud_config¶

Parameter	Description
compute_config	Defines the GCP account compute credentials for the cluster.
storage_config	Defines the GCP account storage credentials for the cluster.
location	Sets the GCP geographical location.
network_config	Defines the network configuration for the cluster.
cluster_composition	Defines the mixture of on-demand instances and preemptible instances for the cluster.

compute_config¶

Parameter	Description
use_account_compute_creds	Determines whether to use account compute credentials. By default, it is set to `false`. Set it to `true` to use account compute credentials.
customer_project_id	The project ID, unique across GCP.

storage_config¶

Parameter	Description
customer_project_id	The project ID, unique across GCP.
disk_type
disk_size_in_gb
disk_count
disk_upscaling_config

location¶

Parameter	Description
region	A Google-defined geographical location where you can run your GCP resources.
zone	A subdivision of a GCP region, identified by letter a, b, c, etc.

network_config¶

Parameter	Description
network	The Google VPC network.
subnet	The name of the subnet.
master_static_ip	The static IP address to be attached to the cluster’s coordinator node.
bastion_node_public_dns	The bastion host public DNS name if private subnet is provided for the cluster in a VPC. Do not specify this value for a public subnet.
bastion_node_port	The port of the bastion node. The default value is 22. You can specify a non-default port if you want to access a cluster that is in a VPC with a private subnet.
bastion_node_user	The bastion node user, which is ec2-user by default. You can a non-default user using this option.
master_elastic_ip	The elastic IP address for attaching to the cluster coordinator. For more information, see this documentation.

cluster_composition¶

Parameter	Description
master	Whether the coordinator node is preemptible or not.
min_nodes	Specifies what percentage of minimum required nodes can be preemptible instances.
autoscaling_nodes	Specifies what percentage of autoscaling nodes can be preemptible instances.

cluster_info¶

Parameter	Description
master_instance_type	Defines the coordinator node type.
slave_instance_type	Defines the worker node type.
node_base_cooldown_period	With the aggressive downscaling feature enabled on the QDS account, this is the cool down period set in minutes for nodes on a Hadoop 2 or Spark cluster. The default value is 15 minutes. Note The aggressive downscaling feature is only available on request.
node_volatile_cooldown_period	With the aggressive downscaling feature enabled on the QDS account, this is the cool down period set in minutes for preemptible nodes on a Hadoop 2 or Spark cluster. The default value is 15 minutes. The default value is 15 minutes. Note The aggressive downscaling feature is only available on request.
label	Label for the cluster.
min_nodes	The minimum number of worker nodes. The default value is 1.
max_nodes	The maximum number of nodes up to which the cluster can be autoscaled. The default value is 2.
idle_cluster_timeout_in_secs	After enabling the aggressive downscaling feature on the QDS account, the Cluster Idle Timeout can be configured in seconds. Its minimum configurable value is `300 seconds` and the default value would still remain 2 hours (that is 120 minutes or 7200 seconds). Note This feature is only available on a request. Create a ticket with Qubole Support to enable this feature on the QDS account.
cluster_name	The name of the cluster.
node_bootstrap	A file that is executed on every node of the cluster at boot time. Use this to customize the cluster nodes by setting up environment variables, installing the required packages, and so on. The default value is, `node_bootstrap.sh`.
disallow_cluster_termination	Prevent auto-termination of the cluster after a prolonged period of disuse. The default value is, `false`.
force_tunnel
customer_ssh_key	SSH key to use to login to the instances. The default value is none. (Note: This parameter is not visible to non-admin users.) The SSH key must be in the OpenSSH format and not in the PEM/PKCS format.
env_settings
datadisk
root_volume_size	Defines the size of the root volume of cluster instances. The supported range for the root volume size is 90 - 2047. An example usage would be `"rootdisk" => {"size" => 500}.`

engine_config¶

Parameter	Description
flavour	Denotes the type of cluster. The supported values are: `hadoop2` and `spark`.
hadoop_settings	To change the coordinator node type from the default, select a different type from the drop-down list.
hive_settings	Enter the minimum number of worker nodes if you want to change it (the default is 1).

hadoop_settings¶

Parameter	Description
custom_hadoop_config	The custom Hadoop configuration overrides. The default value is blank.
use_qubole_placement_policy	Use Qubole Block Placement policy for clusters with preemptible nodes.
is_ha
fairscheduler_settings	The fair scheduler configuration options.

hive_settings¶

Parameter	Description
hive_version	Set to 2.1.1.
pig_version	The default version of Pig is 0.11. Pig 0.15 and Pig 0.17 (beta) are the other supported versions. Pig 0.17 (beta) is only supported with Hive 2.1.1.
pig_execution_engine
overrides	The custom configuration overrides. The default value is blank.
is_metadata_cache_enabled
execution_engine

monitoring¶

Parameter	Description
ganglia	Whether to enable Ganglia monitoring for the cluster. The default value is, `false`.
datadog

airflow_settings¶

The following table contains engine_config for an Airflow cluster.

Note

Parameters marked in bold below are mandatory. Others are optional and have default values.

Parameter	Description
dbtap_id	ID of the data store inside QDS. Set it to `-1` if you are using the local MySQL instance as the data store.
fernet_key	Encryption key for sensitive information inside airflow database. For example, user passwords and connections. It must be a 32 url-safe base64 encoded bytes.
type	Engine type. It is `airflow` for an Airflow cluster.
version	The default version is 1.10.0 (stable version). The other supported stable versions are 1.8.2 and 1.10.2. All the Airflow versions are compatible with MySQL 5.6 or higher.
airflow_python_version	Supported versions are 3.5 (supported using package management) and 2.7. To know more, see Configuring an Airflow Cluster.
overrides	Airflow configuration to override the default settings. Use the following syntax for overrides: `<section>.<property>=<value>\n<section>.<property>=<value>...`

internal¶

Parameter	Description
zeppelin_interpreter_mode	The default mode is `legacy`. Set it to `user` mode if you want the user-level cluster-resource management on notebooks. See Configuring a Spark Notebook for more information.
image_uri_overrides
spark_s3_package_name
zeppelin_s3_package_name

Request API Syntax¶

curl -X POST -H "X-AUTH-TOKEN:$X_AUTH_TOKEN" -H "Content-Type:application/json" -H "Accept: application/json" \
-d '{
     "cloud_config": {
             "compute_config": {
                     "use_account_compute_creds": true,
                     "customer_project_id": "dev-acm-cust-project-1"
             },
             "storage_config": {
                     "customer_project_id": "dev-acm-cust-project-1",
                     "disk_type": null,
                     "disk_size_in_gb": 100,
                     "disk_count": 0,
                     "disk_upscaling_config": null
             },
             "location": {
                     "region": "us-east1",
                     "zone": "us-east1-b"
             },
             "network_config": {
                     "network": "projects/dev-acm-cust-project-1/global/networks/default",
                     "subnet": "projects/dev-acm-cust-project-1/regions/us-east1/subnetworks/default",
                     "master_static_ip": null,
                     "bastion_node_public_dns": null,
                     "bastion_node_port": null,
                     "bastion_node_user": null,
                     "master_elastic_ip": null
             },
             "cluster_composition": {
                     "master": {
                             "preemptible": false
                     },
                     "min_nodes": {
                             "preemptible": false,
                             "percentage": 0
                     },
                     "autoscaling_nodes": {
                             "preemptible": true,
                             "percentage": 50
                     }
             }
     },
     "cluster_info": {
             "master_instance_type": "n1-standard-4",
             "slave_instance_type": "n1-standard-4",
             "node_base_cooldown_period": null,
             "label": ["gcp-cluster-2"],
             "min_nodes": 1,
             "max_nodes": 1,
             "idle_cluster_timeout_in_secs": null,
             "cluster_name": "gcpqbol_acc44_cl176",
             "node_bootstrap": "node_bootstrap.sh",
             "disallow_cluster_termination": false,
             "force_tunnel": false,
             "customer_ssh_key": null,
             "child_hs2_cluster_id": null,
             "parent_cluster_id": null,
             "env_settings": {},
             "datadisk": {
                     "encryption": false
             },
             "slave_request_type": "ondemand",
             "spot_settings": {}
     },
     "engine_config": {
             "flavour": "hadoop2",
             "hadoop_settings": {
                     "custom_hadoop_config": null,
                     "use_qubole_placement_policy": true,
                     "is_ha": null,
                     "fairscheduler_settings": {
                             "default_pool": null
                     }
             },
             "hive_settings": {
                     "is_hs2": false,
                     "hive_version": "2.1.1",
                     "pig_version": "0.11",
                     "pig_execution_engine": "mr",
                     "overrides": null,
                     "is_metadata_cache_enabled": true,
                     "execution_engine": "tez",
                     "hs2_thrift_port": null
             }
     },
     "monitoring": {
             "ganglia": false,
             "datadog": {
                     "datadog_api_token": null,
                     "datadog_app_token": null
             }
     },
     "internal": {
             "zeppelin_interpreter_mode": null,
             "image_uri_overrides": null,
             "spark_s3_package_name": null,
             "zeppelin_s3_package_name": null
     }
 }' \ "https://gcp.qubole.com/api/v2/clusters"

Sample API Request¶

curl -X POST -H "X-AUTH-TOKEN:$X_AUTH_TOKEN" -H "Content-Type:application/json" -H "Accept: application/json"
-d '{
       "cloud_config" : {
         "provider" : "gcp"
         "compute_config" : {
                       "compute_validated": False,
                       "use_account_compute_creds": False,
                       "compute_client_id": "<your client ID>",
                       "compute_client_secret": "<your client secret key>",
                       "compute_tenant_id": "<your tenant ID>",
                       "compute_subscription_id": "<your subscription ID>"
                 },
         "location": {
                       "location": "centralus"
                 },
         "network_config" : {
                       "vnet_name" : "<vpc name>",
                           "subnet_name": "<subnet name>",
                           "vnet_resource_group_name": "<vnet resource group name>",
                           "persistent_security_groups": "<persistent security group>",
                 },
         "storage_config" : {
                       "storage_access_key": "<your storage access key>",
                       "storage_account_name": "<your storage account name>",
                       "disk_storage_account_name": "<your disk storage account name>",
                       "disk_storage_account_resource_group_name": "<your disk storage account resource group name>"
           "data_disk_count":4,
           "data_disk_size":300 GB
                 }
       },
       "cluster_info": {
            "master_instance_type": "Standard_A6",
            "slave_instance_type": "Standard_A6",
            "label": ["gcp"],
            "min_nodes": 1,
            "max_nodes": 2,
            "cluster_name": "GCP1",
            "node_bootstrap": "node_bootstrap.sh",
            },
       "engine_config": {
            "flavour": "hadoop2",
              "hadoop_settings": {
                  "custom_hadoop_config": "mapred.tasktracker.map.tasks.maximum=3",
              }
             },
       "monitoring": {
              "ganglia": true,
             }
       }' \ "https://gcp.qubole.com/api/v2/clusters"