Automated Setup¶
This section describes the automated process for setting up a Qubole account on GCP. Before beginning the setup process, be sure that you have the prerequisites described in Prerequisites and Signup.
Required Permissions for Setup¶
- Default Storage Location (defloc): QDS must have read/write access to a default storage location in Google Cloud Storage where you want QDS to save log files and write the results of the queries you run. You will enter the defloc location during the setup process.
- QDS must have read/write access to the buckets in Cloud Storage where you will store the data you want to process with QDS.
- To perform the automated account setup, certain permissions must be assigned to the Qubole service account (QSA), as described in step f of the Setup Process below.
Note
For information on creating a service account, see Creating and managing service accounts in the GCP documentation. For information on assigning roles to a service account, see Granting roles to service accounts in the GCP documentation.
Setup Process¶
In the QDS UI, go to Control Panel > Account Settings > Access Settings. For Access Mode Type, select Automated.
In this step, you will assign the required GCP permissions for your QDS account.
Log in to the GCP console and navigate to the IAM & admin page (https://console.cloud.google.com/iam-admin/iam).
At the top of the IAM & admin page, select the project that you want to associate with your QDS account.
Click the Add button to assign an IAM policy to the project.
In the QDS UI, go to Control Panel > Account Settings > Access Settings, and copy the Qubole service account’s (QSA) email address:
Paste the copied email address into the New members text box on the IAM & admin page in the GCP console:
Add either the roles or the granular permissions below on the Qubole service account (QSA). You can either assign predefined GCP roles or create custom granular roles. Note that when you assign predefined roles, you might be assigning broader permissions than what QDS requires. If any of the required permissions are missing, however, account setup may fail.
- Predefined roles:
- Service Account Admin: This role includes permissions for working with service accounts.
- Project IAM Admin: This role contains permissions to access and administer a project’s IAM policies.
- Storage Legacy Bucket Owner/Storage Admin: At least one of these two roles should appear in the IAM section of your GCP console. Add either of these roles to provide read/write access to existing buckets with object listing, creation, and deletion.
- Role Administrator role: Create custom roles.
- Custom granular permissions:
To apply granular permissions, you must first create a custom role and then assign it to the Qubole service account (QSA). For information about creating custom roles, see Creating and managing custom roles in the GCP documentation. Include the following permissions in your custom role:
- iam.roles.create
- iam.roles.delete
- iam.roles.get
- iam.roles.list
- iam.roles.undelete
- iam.roles.update
- iam.serviceAccounts.create
- iam.serviceAccounts.delete
- iam.serviceAccounts.get
- iam.serviceAccounts.getIamPolicy
- iam.serviceAccounts.list
- iam.serviceAccounts.setIamPolicy
- iam.serviceAccounts.update
- resourcemanager.projects.get
- resourcemanager.projects.getIamPolicy
- resourcemanager.projects.list
- resourcemanager.projects.setIamPolicy
- storage.buckets.getIamPolicy
- storage.buckets.list
- storage.buckets.setIamPolicy
- Click Save to complete the assigning of the IAM permissions.
- It might take a few seconds for the permission changes to take effect.
In the QDS UI, go to Control Panel > Account Settings > Access Settings and reload the page.
From the Projects dropdown, select the project ID for the project to which you gave IAM permissions to the Qubole service account (QSA).
In the Default Location field, enter the name of the bucket (without the prefix (
gs://
) that will serve as your default location (defloc) in Cloud Storage.Optionally, in the Data Bucket(s) field, provide a comma-separated list of data buckets (without the
gs://
prefix) where you want QDS to read and write data. You can provide up to a maximum of five data buckets.Click Save. Appropriate error messages are displayed if there are errors.
Validation of credentials after Save:
- If your settings were saved successfully, you will see a message at the top of the page saying, “Please wait while we validate your settings. It may take up to a few minutes.” Upon completion of the validation, your account will be fully operational.
- Qubole validates your settings in the background, allowing you to use the application while the settings are being validated, but you will not be allowed to update the access settings or perform operations that interact with GCP, such as starting a cluster.
- Validation may take up to 5 minutes.
- If validation is successful, you will see green check marks in the Access Settings section next to the Default Location and Data Bucket(s) fields. If validation fails, you will see a red X next to the respective field(s).
Troubleshooting:
- Try re-saving the access settings.
- If the problem persists, contact Qubole Support.
Changing project ID for the QDS account:
- If you update your project ID in the QDS Access Settings UI, you must assign QSA the required permissions (as described above) again on the new project.
- The project can be changed when there are no running clusters.
Custom Roles Created During Automated Setup¶
During automated setup, Qubole creates two custom roles in your project: qbol_compute_role
and qbol_storage_role
and assigns both of them to both your Compute Service Account (CSA) and Instance Service Account (ISA). The GCP permissions included in these roles are listed below. Do not modify or delete these roles from the project as doing so might lead to unexpected behavior.
The custom qbol_compute_role
includes the following GCP permissions:
- compute.addresses.use
- compute.addresses.useInternal
- compute.disks.create
- compute.disks.delete
- compute.disks.get
- compute.disks.list
- compute.disks.setLabels
- compute.disks.use
- compute.diskTypes.list
- compute.firewalls.create
- compute.firewalls.delete
- compute.firewalls.get
- compute.firewalls.list
- compute.firewalls.update
- compute.globalOperations.get
- compute.instances.attachDisk
- compute.instances.create
- compute.instances.delete
- compute.instances.detachDisk
- compute.instances.get
- compute.instances.list
- compute.instances.reset
- compute.instances.resume
- compute.instances.setLabels
- compute.instances.setMetadata
- compute.instances.setServiceAccount
- compute.instances.setTags
- compute.instances.start
- compute.instances.stop
- compute.instances.suspend
- compute.instances.use
- compute.networks.list
- compute.networks.updatePolicy
- compute.networks.use
- compute.networks.useExternalIp
- compute.regions.get
- compute.subnetworks.list
- compute.subnetworks.use
- compute.subnetworks.useExternalIp
- compute.zoneOperations.get
The custom qbol_storage_role
includes the following GCP permissions:
- storage.buckets.get
- storage.buckets.getIamPolicy
- storage.buckets.list
- storage.objects.create
- storage.objects.delete
- storage.objects.get
- storage.objects.list
Roles for Google BigQuery¶
In addition, two GCP roles are assigned to enable use of Google BigQuery as follows:
bigquery.dataViewer
is assigned on your Compute Service Account (CSA)bigquery.readSessionUser
is assigned on your Instance Service Account (ISA)
For more information about these roles, see Predefined roles and permissions in the GCP BigQuery documentation.
bigquery.dataViewer
contains the following GCP permissions:
- bigquery.datasets.get
- bigquery.datasets.getIamPolicy
- bigquery.models.getData
- bigquery.models.getMetadata
- bigquery.models.list
- bigquery.routines.get
- bigquery.routines.list
- bigquery.tables.export
- bigquery.tables.get
- bigquery.tables.getData
- bigquery.tables.list
- resourcemanager.projects.get
- resourcemanager.projects.list
bigquery.readSessionUser
contains the following GCP permissions:
- bigquery.readsessions.*
- resourcemanager.projects.get
- resourcemanager.projects.list