Clone a Schedule¶
- 
POST/api/v1.2/scheduler/(SchedulerID)/duplicate¶
Use this API to clone an existing schedule by providing a new schedule name.
Required Role¶
The following users can make this API call:
- Users who belong to the system-admin or system-user group.
- Users who belong to a group associated with a role that allows cloning a schedule. See Managing Groups and Managing Roles for more information.
Parameters¶
Note
Parameters marked in bold below are mandatory. Others are optional and have default values.
| Parameter | Description | 
|---|---|
| command_type | A valid command type supported by Qubole. For example, HiveCommand, HadoopCommand, PigCommand. | 
| command | JSON object describing the command. Refer to the command-api for more details. Sub fields can use macros. Refer to the Qubole Scheduler for more details. | 
| start_time | Start datetime for the schedule | 
| end_time | End datetime for the schedule | 
| frequency | Set this option or cron_expressionbut do not set both options. Specify how often the schedule should run. Input is an integer.
For example, frequency of one hour/day/month is represented as{"frequency":"1"} | 
| time_unit | Denotes the time unit for the frequency. Its default value isdays. Accepted value isminutes,hours,days,weeks, ormonths. | 
| cron_expression | Set this option or frequencybut do not set both options. The standard cron format is “s, m, h, d, M, D, Y” where s is second, m is minute,
M is month, d is date, and D is day of the week. Only year (Y) is optional. Example -"cron_expression":"0 0 12 * * ?". For more information, see
Cron Trigger Tutorial. | 
| name | A user-defined name for a schedule. If name is not specified, then a system-generated Schedule ID is set as the name. While cloning an existing schedule, you must change the name. | 
| label | Specify a cluster label that identifies the cluster on which the schedule API call must be run. | 
| macros | Expressions to evaluate macros. Macros can be used in parameterized commands. Refer to the Macros in Scheduler page for more details. | 
| no_catch_up | Set this parameter to trueif you want to skip schedule actions that were supposed to have run in the past and run only the api/v2.0 schedule actions. By default, this
parameter is set tofalse. When a new schedule is created, the scheduler runs schedule actions from start time to the current time. For example, if a daily schedule is
created from Jun 1, 2015 on Dec 1, 2015, schedules are run for Jun 1, 2015, Jun 2, 2015, and so on. If you do not want the scheduler to run the missed schedule actions for
months earlier to Dec, setno_catch_uptotrue. The main use of skipping a schedule action is if when you suspend a schedule and resume it later, in which case,
there will be more than one schedule action and you might want to skip the earlier schedule actions. For more information, see Understanding the Qubole Scheduler Concepts. | 
| time_zone | Timezone of the start and end time of the schedule. Scheduler will understand ZoneInfo identifiers. For example, Asia/Kolkata. For a list of identifiers, check column 3 in List of TZ in databases. Default value is UTC. | 
| command_timeout | You can set the command timeout configurable in seconds. Its default value is 129600 seconds (36 hours) and any other value that you set must be less than 36 hours. QDS checks the timeout for a command every 60 seconds. If the timeout is set for 80 seconds, the command gets killed in the next minute that is after 120 seconds. By setting this parameter, you can avoid the command from running for 36 hours. | 
| time_out | Unit is minutes. A number that represents a maximum amount of time the schedule should wait for dependencies to be satisfied. | 
| concurrency | Specify how many scheudle actions can run at a time. Default value is 1. | 
| dependency_info | Describe dependencies for this schedule. Check the Hive Datasets as Schedule Dependency for more information. | 
| notification | It is an optional parameter that is set to false by default. You can set it to true if you want to be notified through email about instance failure. notification provides more information. | 
notification¶
| Parameter | Description | 
|---|---|
| is_digest | It is a notification email type that is set to trueif a schedule periodicity is in
minutes or hours. If it set to false, the email type is immediate by default. | 
| notify_failure | If this option is set to true, you receive schedule failure notifications. | 
| notify_success | If this option is set to true, you receive schedule success notifications. | 
| notification_email_list | By default, the current user’s email ID is added. You can add additional email IDs as required. | 
dependency_info¶
| Parameter | Description | |
|---|---|---|
| files | Use this parameter if there is dependency on S3 files and it has the following sub options. For more information, see Configuring GS Files Data Dependency. | |
| path | It is the S3 path of the dependent file (with data) based on which the schedule runs. | |
| window_start | It denotes the start day or time. | |
| window_end | It denotes the end day or time. | |
| hive_tables | Use this parameter if there is dependency on Hive table data that has partitions. For more information, see Configuring Hive Tables Data Dependency. | |
| schema | It is the database that contains the partitioned Hive table. | |
| name | It is the name of the partitioned Hive table. | |
| window_start | It denotes the start day or time. | |
| window_end | It denotes the end day or time. | |
| interval | It denotes the dataset interval and defines how often the data is
generated. Hive Datasets as Schedule Dependency provides more
information. You must also specify the incremental time that can be in minutes,hours,days,weeks, ormonths. The usage
is"interval":{"days":"1"}. The default interval is 1 day. | |
| column | It denotes the partitioned column name. You must specify the date-time
mask through the_dateparameter denotes how to convert from date to
string for the partition. The usage is"columns":{"the_date":"<value>"}. The<value>can be a macro or
a string. | |
Response¶
The response contains a JSON object representing the cloned schedule.
Note
There is a rerun limit for schedule reruns to be processed concurrently at a given point of time. Understanding the Qubole Scheduler Concepts provides more information.
Example¶
Goal: Clone an existing schedule, for example: schedule ID 3159, to create a new schedule. For more information on how to create a schedule, see Create a Schedule.
While creating a schedule, we created a schedule that aggregates data every day, for every stock symbol, and for each stock exchange. For example, if you want to edit the query to also calculate the total transaction amount for the stock in a day, provide the following query.
{
  "command_type":"HiveCommand",
  "command": {
    "query": "select stock_symbol, stock_exchange, max(high), min(low), sum(volume) from daily_tick_data where date1='$formatted_date$' group by stock_symbol, stock_exchange"
  },
  "start_time": "2012-11-01T02:00Z",
  "end_time": "2022-10-01T02:00Z"
}
Command
curl -i -X POST -H "X-AUTH-TOKEN: $AUTH_TOKEN" -H "Accept: application/json" -H "Content-type: application/json" \
-d '{ "name": "schedule1" }' \
"https://gcp.qubole.com/api/v1.2/scheduler/3159/duplicate"
Sample Response
{
 "time_out":10,
 "status":"RUNNING",
 "start_time":"2012-07-01 02:00",
 "label":"default",
 "concurrency":1,
 "frequency":1,
 "no_catch_up":false,
 "template":"generic",
 "command":{
            "sample":false,"loader_table_name":null,"md_cmd":null,"script_location":null,"approx_mode":false,"query":"select stock_symbol, max(high), min(low), sum(volume) from daily_tick_data where date1='$formatted_date$' group by stock_symbol","loader_stable":null,"approx_aggregations":false
           },
 "time_zone":"UTC",
 "time_unit":"days",
 "end_time":"2022-07-01 02:00",
 "user_id":108,
 "macros":[{"formatted_date":"Qubole_nominal_time.format('YYYY-MM-DD')"}],
 "incremental":{},
 "command_type":"HiveCommand",
 "name":"schedule1",
 "dependency_info":{},
 "id":3160,
 "next_materialized_time":null
}