Configuring Spark Settings for Jupyter Notebooks¶

By default, the cluster-wide spark configurations are used for Jupyter notebooks. You can specify the required Spark settings to configure the Spark application for a Jupyter notebook by using the %%configure magic.

You should specify the required configuration at the beginning of the notebook, before you run your first spark bound code cell.

If you want to specify the required configuration after running a Spark bound command, then you should use the -f option with the %%configure magic. If you use the -f option, then all the progress made in the previous Spark jobs is lost.

The following sample codes show how to specify Spark configurations.

%%configure -f
{"executorMemory": "3072M", "executorCores": 4, "numExecutors":10}

%%configure -f
{ "driverMemory" : "20G", "conf" : { "spark.sql.files.ignoreMissingFiles": "true",
"spark.jars.packages": "graphframes:graphframes:0.7.0-spark2.4-s_2.11"}}

Note

The Spark drivers are created on the cluster worker nodes by default for better distribution of load and better usage of cluster resources. If you want to execute the Spark driver on the coordinator node, contact Qubole Support.

The following table lists the Spark configuration parameters with their values.

Parameters	Description	Values
jars	Jars to be used in the session	List of string
pyFiles	Python files to be used in the session	List of string
files	Files to be used in the session	List of string
driverMemory	Amount of memory to be used for the driver process	string
driverCores	Number of cores to be used for the driver process	int
executorMemory	Amount of memory to be used for the executor process	string
executorCores	Number of cores to be used for the executor process	int
numExecutors	Number of executors to be launched for the session	int
archives	Archives to be used in the session	List of string
queue	Name of the YARN queue	string
name	Name of the session (name must be in lower case)	string
conf	Spark configuration properties Note You can specify all other Spark configurations.	Map of key=val