Running Spark Applications in Notebooks

You can run Spark Applications from the Notebooks page of the QDS UI. When running Spark applications in notebooks, you should understand notebooks, how to associate interpreters with notebooks, how to run concurrent commands, and how to set the context.

Log in to QDS with your username and password. Navigate to the Notebooks page.

Understanding Notebooks

You can create any number of new notebooks. Notebooks data is synced and persisted in cloud storage (for example S3) regularly. Notebooks are associated with a cluster, so notebooks from cluster A can not be accessed by cluster B.

See Notebooks for more information on Qubole’s Notebooks. For more information, see:

Associating Interpreters with Notebooks

Spark Interpreters are associated with notebooks by default. However, if you want to use any user interpreter then you must associate the interpreter with the notebook. Interpreters are started on demand when required by notebooks.

  1. On the Notebooks page, click on the Gear icon.

    ../../../../_images/ExpActionIcon1.png

    On the Settings page, list of Interpreters are displayed as shown in the following figure.

    ../../../../_images/notebook-interpreter-association.png

    The first interpreter on the list is the default interpreter.

  2. Click on the required interpreter to associate with the notebook.

  3. Click Save.

Running Concurrent Spark Commands

You can run multiple Spark SQL commands in parallel.

  1. On the Notebooks page, click Interpreters.

  2. For the required interpreter, click on the corresponding edit button.

  3. Set the following properties:

    • zeppelin.spark.concurrentSQL = true
    • zepplin.spark.sql.maxConcurrency = number of concurrent commands to be run
  4. Create multiple paragraphs with Spark SQL commands and click the Run icon on the left side to run all the paragraphs.

    All the paragraphs run concurrently.

Using SparkContext, HiveContext and SQLContext Objects

Similar to a Spark shell, a SparkContext (SC) object is available in the Notebook. In addition, HiveContext is also available in a Notebook if you have to set the following configuration to true in interpreter settings:

zeppelin.spark.useHiveContext

If this configuration is not set to true, then SQLContext is available in a Notebook instead of HiveContext.