Using the Node Bootstrap on Airflow Clusters

In QDS, all clusters share the same node bootstrap script by default, but for an Airflow cluster running on AWS, Qubole recommends you configure a separate node bootstrap script.

Note

A separate, Airflow-specific node bootstrap script is currently supported only on AWS.

Through the node bootstrap script, you can:

Install Packages on Airflow Cluster

Add this code snippet in the node bootstrap to install packages on the Airflow cluster.

# this activates the virtual environment on which airflow is running, so that we can install pacakges in it
source ${AIRFLOW_HOME}/airflow/qubole_assembly/scripts/virtualenv.sh activate

pip install <package name>
source ${AIRFLOW_HOME}/airflow/qubole_assembly/scripts/virtualenv.sh deactivate

Automatically Synchronize DAGs from a GitHub Repository

Add this code snippet in the node bootstrap editor to automatically synchronize DAGs from a GitHub repository.

# clone the repo using github access token
git clone https://{access_token}@github.com/username/airflow-dags.git $AIRFLOW_HOME/dags

# prepare command
command="*/5 * * * * cd $AIRFLOW_HOME/dags; git pull"

# register it on cron
crontab -l | { cat; echo "$command"; } | crontab -

Create a User in RabbitMQ to Access it Through Dashboard

If you are using RabbitMQ, which is installed on the cluster and if you want to access its dashboard through QDS, create a user in RabbitMQ as the default user (guest) cannot access the RabbitMQ dashboard from outside.

Add following code snippet in bootstrap to add a new user (new_user) in RabbitMQ.

/usr/sbin/rabbitmqctl add_user new_user new_password
/usr/sbin/rabbitmqctl set_user_tags new_user administrator;
/usr/sbin/rabbitmqctl set_permissions -p / new_user ".*" ".*" ".*"