Configuring Spark Settings for Jupyter NotebooksΒΆ

By default, the cluster-wide spark configurations are used for Jupyter notebooks. You can specify the required Spark settings to configure the Spark application for a Jupyter notebook by using the %%configure magic.

You should specify the required configuration at the beginning of the notebook, before you run your first spark bound code cell.

If you want to specify the required configuration after running a Spark bound command, then you should use the -f option with the %%configure magic. If you use the -f option, then all the progress made in the previous Spark jobs is lost.

The following sample codes show how to specify Spark configurations.

%%configure -f
{"executorMemory": "3072M", "executorCores": 4, "numExecutors":10}
%%configure -f
{ "driverMemory" : "20G", "conf" : { "spark.sql.files.ignoreMissingFiles": "true",
"spark.jars.packages": "graphframes:graphframes:0.7.0-spark2.4-s_2.11"}}

Note

The Spark drivers are created on the cluster worker nodes by default for better distribution of load and better usage of cluster resources. If you want to execute the Spark driver on the coordinator node, contact Qubole Support.

The following table lists the Spark configuration parameters with their values.

Parameters Description Values
jars Jars to be used in the session List of string
pyFiles Python files to be used in the session List of string
files Files to be used in the session List of string
driverMemory Amount of memory to be used for the driver process string
driverCores Number of cores to be used for the driver process int
executorMemory Amount of memory to be used for the executor process string
executorCores Number of cores to be used for the executor process int
numExecutors Number of executors to be launched for the session int
archives Archives to be used in the session List of string
queue Name of the YARN queue string
name Name of the session (name must be in lower case) string
conf

Spark configuration properties


Note

You can specify all other Spark configurations.

Map of key=val