Composing a Hadoop Job

Use the command composer on the Workbench page to compose a Hadoop job.

You can use the query composer for these types of Hadoop job:

Note

Before running a Hadoop job, make sure that the output directory is new and does not exist.

Hadoop and Presto clusters support Hadoop job queries. See Mapping of Cluster and Command Types for more information.

Compose a Hadoop Custom Jar Query

Perform the following steps to compose a Hadoop jar query:

  1. Navigate to the Workbench page and click + Create New.
  2. Select Hadoop from the drop-down list at the top of the page (near the middle). Custom Jar is selected by default in the Job Type drop-down list, and this is what you want.
  3. In the Path to Jar File field, specify the path of the directory that contains the Hadoop jar file.
  4. In the Arguments text field, specify the main class, generic options, and other JAR arguments.
  5. Click Run to execute the query.

You can see the result under the Results tab, and the logs under the Logs tab. For more information on how to download command results and logs, see Downloading Results and Logs.

For information on the REST API, see Submitting a Hadoop Jar Command.

Compose a Hadoop Streaming Query

Perform the following steps to compose a Hadoop streaming query:

  1. Navigate to the Workbench page and click + Create New.
  2. Select Hadoop from the drop-down list at the top of the page (near the middle).
  3. Select Streaming from the Job Type drop-down list.
  4. In the Arguments field, specify the streaming and generic options.
  5. Click Run to execute the query.

You can see the result under the Results tab, and the logs under the Logs tab. For more information on how to download command results and logs, see Downloading Results and Logs.

Compose a Hadoop DistCp Command

Perform the following steps to compose a Hadoop DistCp command:

  1. Navigate to the Workbench page and click + Create New.
  2. Select Hadoop from the drop-down list at the top of the page (near the middle).
  3. Select clouddistcp from the Job Type drop-down list.
  4. In the Arguments text field, specify the generic and DistCp options.
  5. Click Run to execute the query.

You can see the result under the Results tab, and the logs under the Logs tab. For more information on how to download command results and logs, see Downloading Results and Logs.