Create a Cluster on Google Cloud Platform¶
-
POST
/api/v2/clusters/
¶
Use this API to create a new cluster when you are using Qubole on GCP. You create a cluster for a workload that has to run in parallel with your pre-existing workloads.
You might want to run workloads across different geographical locations or there could be other reasons for creating a new cluster.
Required Role¶
The following users can make this API call:
- Users who belong to the system-admin or system-user group.
- Users who belong to a group associated with a role that allows creating a cluster. See Managing Groups and Managing Roles for more information.
Parameters¶
Note
Parameters marked in bold below are mandatory. Others are optional and have default values.
Parameter | Description |
---|---|
cloud_config | A list of labels that identify the cluster. At least one label must be provided when creating a cluster. |
cluster_info | Contains the configurations of a cluster. |
engine_config | Contains the configurations of the type of clusters |
monitoring | Contains the cluster monitoring configuration. |
internal | Contains the security settings for the cluster. |
cloud_config¶
Parameter | Description |
---|---|
compute_config | Defines the GCP account compute credentials for the cluster. |
storage_config | Defines the GCP account storage credentials for the cluster. |
location | Sets the GCP geographical location. |
network_config | Defines the network configuration for the cluster. |
cluster_composition | Defines the mixture of on-demand instances and preemptible instances for the cluster. |
compute_config¶
Parameter | Description |
---|---|
use_account_compute_creds | Determines whether to use account compute credentials. By default,
it is set to false . Set it to true to use account compute
credentials. |
customer_project_id | The project ID, unique across GCP. |
storage_config¶
Parameter | Description |
---|---|
customer_project_id | The project ID, unique across GCP. |
disk_type | |
disk_size_in_gb | |
disk_count | |
disk_upscaling_config |
location¶
Parameter | Description |
---|---|
region | A Google-defined geographical location where you can run your GCP resources. |
zone | A subdivision of a GCP region, identified by letter a, b, c, etc. |
network_config¶
Parameter | Description |
---|---|
network | The Google VPC network. |
subnet | The name of the subnet. |
master_static_ip | The static IP address to be attached to the cluster’s coordinator node. |
bastion_node_public_dns | The bastion host public DNS name if private subnet is provided for the cluster in a VPC. Do not specify this value for a public subnet. |
bastion_node_port | The port of the bastion node. The default value is 22. You can specify a non-default port if you want to access a cluster that is in a VPC with a private subnet. |
bastion_node_user | The bastion node user, which is ec2-user by default. You can a non-default user using this option. |
master_elastic_ip | The elastic IP address for attaching to the cluster coordinator. For more information, see this documentation. |
cluster_composition¶
Parameter | Description |
---|---|
master | Whether the coordinator node is preemptible or not. |
min_nodes | Specifies what percentage of minimum required nodes can be preemptible instances. |
autoscaling_nodes | Specifies what percentage of autoscaling nodes can be preemptible instances. |
cluster_info¶
Parameter | Description |
---|---|
master_instance_type | Defines the coordinator node type. |
slave_instance_type | Defines the worker node type. |
node_base_cooldown_period | With the aggressive downscaling feature enabled on the QDS account, this is the cool down period set in minutes for nodes on a Hadoop 2 or Spark cluster. The default value is 15 minutes. Note The aggressive downscaling feature is only available on request. |
node_volatile_cooldown_period | With the aggressive downscaling feature enabled on the QDS account, this is the cool down period set in minutes for preemptible nodes on a Hadoop 2 or Spark cluster. The default value is 15 minutes. The default value is 15 minutes. Note The aggressive downscaling feature is only available on request. |
label | Label for the cluster. |
min_nodes | The minimum number of worker nodes. The default value is 1. |
max_nodes | The maximum number of nodes up to which the cluster can be autoscaled. The default value is 2. |
idle_cluster_timeout_in_secs | After enabling the aggressive downscaling feature on the QDS account, the
Cluster Idle Timeout can be configured in seconds. Its minimum
configurable value is
|
cluster_name | The name of the cluster. |
node_bootstrap | A file that is executed on every node of the cluster at boot time. Use
this to customize the cluster nodes by setting up environment variables,
installing the required packages, and so on. The default value is,
node_bootstrap.sh . |
disallow_cluster_termination | Prevent auto-termination of the cluster after a prolonged period of
disuse. The default value is, false . |
force_tunnel | |
customer_ssh_key | SSH key to use to login to the instances. The default value is none. (Note: This parameter is not visible to non-admin users.) The SSH key must be in the OpenSSH format and not in the PEM/PKCS format. |
env_settings | |
datadisk | |
root_volume_size | Defines the size of the root volume of cluster instances. The supported
range for the root volume size is 90 - 2047. An example usage would be
"rootdisk" => {"size" => 500}. |
engine_config¶
Parameter | Description |
---|---|
flavour | Denotes the type of cluster. The supported values are:
hadoop2 and spark . |
hadoop_settings | To change the coordinator node type from the default, select a different type from the drop-down list. |
hive_settings | Enter the minimum number of worker nodes if you want to change it (the default is 1). |
hadoop_settings¶
Parameter | Description |
---|---|
custom_hadoop_config | The custom Hadoop configuration overrides. The default value is blank. |
use_qubole_placement_policy | Use Qubole Block Placement policy for clusters with preemptible nodes. |
is_ha | |
fairscheduler_settings | The fair scheduler configuration options. |
hive_settings¶
Parameter | Description |
---|---|
hive_version | Set to 2.1.1. |
pig_version | The default version of Pig is 0.11. Pig 0.15 and Pig 0.17 (beta) are the other supported versions. Pig 0.17 (beta) is only supported with Hive 2.1.1. |
pig_execution_engine | |
overrides | The custom configuration overrides. The default value is blank. |
is_metadata_cache_enabled | |
execution_engine |
monitoring¶
Parameter | Description |
---|---|
ganglia | Whether to enable Ganglia monitoring
for the cluster. The default value is, false . |
datadog |
airflow_settings¶
The following table contains engine_config
for an Airflow cluster.
Note
Parameters marked in bold below are mandatory. Others are optional and have default values.
Parameter | Description |
---|---|
dbtap_id | ID of the data store inside QDS. Set it to -1 if you are using the local MySQL instance as the data
store. |
fernet_key | Encryption key for sensitive information inside airflow database. For example, user passwords and connections. It must be a 32 url-safe base64 encoded bytes. |
type | Engine type. It is airflow for an Airflow cluster. |
version | The default version is 1.10.0 (stable version). The other supported stable versions are 1.8.2 and 1.10.2. All the Airflow versions are compatible with MySQL 5.6 or higher. |
airflow_python_version | Supported versions are 3.5 (supported using package management) and 2.7. To know more, see Configuring an Airflow Cluster. |
overrides | Airflow configuration to override the default settings. Use the following syntax for overrides:
|
internal¶
Parameter | Description |
---|---|
zeppelin_interpreter_mode | The default mode is legacy . Set it to user mode if you want the user-level
cluster-resource management on notebooks. See Configuring a Spark Notebook for more
information. |
image_uri_overrides | |
spark_s3_package_name | |
zeppelin_s3_package_name |
Request API Syntax¶
curl -X POST -H "X-AUTH-TOKEN:$X_AUTH_TOKEN" -H "Content-Type:application/json" -H "Accept: application/json" \
-d '{
"cloud_config": {
"compute_config": {
"use_account_compute_creds": true,
"customer_project_id": "dev-acm-cust-project-1"
},
"storage_config": {
"customer_project_id": "dev-acm-cust-project-1",
"disk_type": null,
"disk_size_in_gb": 100,
"disk_count": 0,
"disk_upscaling_config": null
},
"location": {
"region": "us-east1",
"zone": "us-east1-b"
},
"network_config": {
"network": "projects/dev-acm-cust-project-1/global/networks/default",
"subnet": "projects/dev-acm-cust-project-1/regions/us-east1/subnetworks/default",
"master_static_ip": null,
"bastion_node_public_dns": null,
"bastion_node_port": null,
"bastion_node_user": null,
"master_elastic_ip": null
},
"cluster_composition": {
"master": {
"preemptible": false
},
"min_nodes": {
"preemptible": false,
"percentage": 0
},
"autoscaling_nodes": {
"preemptible": true,
"percentage": 50
}
}
},
"cluster_info": {
"master_instance_type": "n1-standard-4",
"slave_instance_type": "n1-standard-4",
"node_base_cooldown_period": null,
"label": ["gcp-cluster-2"],
"min_nodes": 1,
"max_nodes": 1,
"idle_cluster_timeout_in_secs": null,
"cluster_name": "gcpqbol_acc44_cl176",
"node_bootstrap": "node_bootstrap.sh",
"disallow_cluster_termination": false,
"force_tunnel": false,
"customer_ssh_key": null,
"child_hs2_cluster_id": null,
"parent_cluster_id": null,
"env_settings": {},
"datadisk": {
"encryption": false
},
"slave_request_type": "ondemand",
"spot_settings": {}
},
"engine_config": {
"flavour": "hadoop2",
"hadoop_settings": {
"custom_hadoop_config": null,
"use_qubole_placement_policy": true,
"is_ha": null,
"fairscheduler_settings": {
"default_pool": null
}
},
"hive_settings": {
"is_hs2": false,
"hive_version": "2.1.1",
"pig_version": "0.11",
"pig_execution_engine": "mr",
"overrides": null,
"is_metadata_cache_enabled": true,
"execution_engine": "tez",
"hs2_thrift_port": null
}
},
"monitoring": {
"ganglia": false,
"datadog": {
"datadog_api_token": null,
"datadog_app_token": null
}
},
"internal": {
"zeppelin_interpreter_mode": null,
"image_uri_overrides": null,
"spark_s3_package_name": null,
"zeppelin_s3_package_name": null
}
}' \ "https://gcp.qubole.com/api/v2/clusters"
Sample API Request¶
curl -X POST -H "X-AUTH-TOKEN:$X_AUTH_TOKEN" -H "Content-Type:application/json" -H "Accept: application/json"
-d '{
"cloud_config" : {
"provider" : "gcp"
"compute_config" : {
"compute_validated": False,
"use_account_compute_creds": False,
"compute_client_id": "<your client ID>",
"compute_client_secret": "<your client secret key>",
"compute_tenant_id": "<your tenant ID>",
"compute_subscription_id": "<your subscription ID>"
},
"location": {
"location": "centralus"
},
"network_config" : {
"vnet_name" : "<vpc name>",
"subnet_name": "<subnet name>",
"vnet_resource_group_name": "<vnet resource group name>",
"persistent_security_groups": "<persistent security group>",
},
"storage_config" : {
"storage_access_key": "<your storage access key>",
"storage_account_name": "<your storage account name>",
"disk_storage_account_name": "<your disk storage account name>",
"disk_storage_account_resource_group_name": "<your disk storage account resource group name>"
"data_disk_count":4,
"data_disk_size":300 GB
}
},
"cluster_info": {
"master_instance_type": "Standard_A6",
"slave_instance_type": "Standard_A6",
"label": ["gcp"],
"min_nodes": 1,
"max_nodes": 2,
"cluster_name": "GCP1",
"node_bootstrap": "node_bootstrap.sh",
},
"engine_config": {
"flavour": "hadoop2",
"hadoop_settings": {
"custom_hadoop_config": "mapred.tasktracker.map.tasks.maximum=3",
}
},
"monitoring": {
"ganglia": true,
}
}' \ "https://gcp.qubole.com/api/v2/clusters"