Using the Qubole Presto Server Bootstrap¶
The Qubole Presto Server Bootstrap is an alternative to the Node Bootstrap Script to install external jars such as presto-udfs before the Presto Server is started. The Presto server comes up before the node bootstrap process is completed. As such, installing external jars for example, Presto UDFs through the node bootstrap requires explicit restart of the Presto daemons. This can get problematic because the server may have already started running a task and thus restarting Presto daemons can cause query failures. Hence, Qubole Presto Server Bootstrap is better suited for such changes.
The Qubole Presto Server Bootstrap is only supported in Presto 0.180 and later versions.
Use the Qubole Presto Server Bootstrap only if you want to execute some script before starting the Presto server. Any script that is part of this bootstrap increases the time taken to bring up the Presto server. Hence, the time taken by the Presto server to accept a query also increases. If there is no dependency in the current cluster node bootstrap script which requires restart of the Presto daemon to pick changes, then it is recommended to use cluster’s node bootstrap only.
There are two ways to define the Qubole Presto Server Bootstrap:
bootstrap.properties- You can add the bootstrap script in it.
bootstrap-file-path- It is the location of the Presto Server Bootstrap file in the cloud object storage that contains the bootstrap. Specifying a
bootstrap-file-pathis recommended when the script is too long.
To configure Qubole Presto Server Bootstrap for a given cluster, follow any one the below steps:
- Through the cluster UI, add it in Advanced Configuration > PRESTO SETTINGS > Override Presto Configuration.
- Through the REST API, add it using the
presto_settings. For more information, see presto_settings.
Qubole Presto Server Bootstrap eliminates the need to restart the Presto daemons as such. Ensure that any explicit commands to restart or exit the Presto server are not included in the bootstrap script. The Presto server is brought up only after the Server Bootstrap is successfully executed. So it is important to verify that there are no errors in the bootstrap script. In addition, if any script or part of the script is migrated/copied from the existing cluster node bootstrap, then remove that bootstrap script or modify it appropriately to avoid the same script from running twice.
Example of a Bootstrap Script Specified in the bootstrap.properties¶
bootstrap.properties: mkdir /usr/lib/presto/plugin/udfs hadoop dfs -get <scheme>bucket/udfs_custom.jar /usr/lib/presto/plugin/udfs/
Example of Specifying a Qubole Presto Server Bootstrap Location¶
existing-node-bootstrap-file.sh can contain the script that is shown in
Example of a Bootstrap Script Specified in the bootstrap.properties.
You can view the content of the
existing-node-bootstrap-file.sh as follows:
$ hadoop fs -cat <scheme>my-bucket/boostraps/existing-node-bootstrap-file.sh mkdir /usr/lib/presto/plugin/udfs hadoop dfs -get `gs://bucket/udfs_custom.jar /usr/lib/presto/plugin/udfs/ $
Using Presto UDFs as a Bootstrap Script¶
Presto on Qubole provides UDFs as external jars, presto-udfs. You can add them through a Presto Server bootstrap under Advanced Configuration > PRESTO SETTINGS of the Presto cluster UI. You can pick one of the following UDFs (based on Presto version) and pass them as overrides in the Override Presto Configuration text box:
The Presto jars below are in the AWS S3 storage location.
UDFs for Presto version 0.208
bootstrap.properties: mkdir /usr/lib/presto/plugin/udfs hadoop dfs -get s3://paid-qubole/presto-udfs/udfs-2.0.3.jar /usr/lib/presto/plugin/udfs/
UDFs for Presto version 317
bootstrap.properties: mkdir /usr/lib/presto/plugin/udfs hadoop dfs -get s3://paid-qubole/presto-udfs/udfs-3.0.0.jar /usr/lib/presto/plugin/udfs/
Presto Server Bootstrap Logs¶
The Presto server bootstrap logs are in