3. indico.modules.scheduler – task scheduling framework and daemon

The scheduler module provides Indico with a scheduling API that allows specific jobs (tasks to be run at given times, with a certain repeatibility, if needed).

3.1. Overview

3.1.1. Architecture

Task workflow

The scheduler module uses the database as the communication medium between “web server processes” (Indico instances, running inside a web server, etc...) and the daemon. This has advantages such as:

  • No need for complex IPC mechanisms (RPC, shared memory, etc...);
  • Everything is in the Indico DB, which makes migration much easier;

But also poses some problems such as:

  • Overhead introduced by regular DB polling on the scheduler (daemon) side;
  • Extra database traffic that can slow down things a bit;
  • Increased possibility of database conflicts;

We tried to mitigate these problems by using conflict-free lightweight data structures.

The Scheduler is the element that is responsible for accepting new tasks and prioritizing them by execution time, launching new processes/threads as they need to be executed. Logs of the operations are kept.

A Client is basically a proxy object that allows operations to be performed on the Scheduler and its tasks in a transparent way.

3.1.2. Workflow

Task workflow

Tasks can be in one of the following states:

  • TASK_STATUS_NONE - Nothing happened yet - this is a transitory state, and normally the state task objects are in when they are created;
  • TASK_STATUS_SPOOLED - The task has been added to the spool, and is currently waiting to be put in the waiting queue;
  • TASK_STATUS_QUEUED - The algorithm has put the task in the waiting queue;
  • TASK_STATUS_RUNNING - The task is being executed;
  • TASK_STATUS_FAILED - The task has failed (execution threw an exception, maybe it was cancelled);
  • TASK_STATUS_FINISHED - The task has successfully finished;
  • TASK_STATUS_TERMINATED - The task has been cancelled by the scheduler (i.e. was AWOL for too long);

...

3.2. Scheduler

The main class in the module is the Scheduler

class indico.modules.scheduler.Scheduler(**config)

Bases: object

A Scheduler object provides a job scheduler based on a waiting queue, that communicates with its clients through the database. Things have been done in a way that the probability of conflict is minimized, and operations are repeated in case one happens.

The entry point of the process consists of a ‘spooler’ that periodically takes tasks out of a conflict-safe FIFO (spool) and adds them to an IOBTree-based waiting queue. The waiting queue is then checked periodically for the next task, and when the time comes the task is executed.

Tasks are executed in different threads.

The Client class works as a transparent remote proxy for this class.

config is a dictionary containing configuration parameters

run()

Main loop, should only be called from scheduler

3.3. Client

Client applications only need to worry about:

class indico.modules.scheduler.Client

Bases: object

Client provices a transparent scheduler client, that allows Indico client processes to interact with the Scheduler without the need for a lot of code.

It acts as a remote proxy.

clearSpool()

Clears the spool, returning the number of removed elements

dequeue(task)

Schedules a task for deletion

enqueue(task)

Schedules a task for execution

getSpool()

Returns the spool

getStatus()
Returns status information (dictionary), containing the lengths (tasks) of:
  • spool;
  • waiting queue;
  • running queue;
  • finished task index;
  • failed task index;

As well as if the scheduler is running (state)

getTask(tid)

Returns a task object, given its task id

shutdown(msg='')

Shuts down the scheduler. msg is an optional paramater that provides an information message that will be written in the logs

startFailedTask(task)

Starts a failed task

3.4. Tasks

class indico.modules.scheduler.tasks.BaseTask(expiryDate=None)

Bases: indico.modules.scheduler.tasks.TimedEvent

A base class for tasks. expiryDate is the last point in time when the task can run. A task will refuse to run if current time is past expiryDate

prepare()

This information will be saved regardless of the task being repeated or not

reset()

Resets a task to its state before being run

tearDown()

If a task needs to do something once it has run and been removed from runningList, overload this method

class indico.modules.scheduler.tasks.OneShotTask(startDateTime, expiryDate=None)

Bases: indico.modules.scheduler.tasks.BaseTask

Tasks that are executed only once

3.5. Module

The module object is of little interest for developers in general. Every Indico instance will transparently provide one through getDBInstance().

class indico.modules.scheduler.SchedulerModule

Bases: indico.modules.base.Module

classmethod getDBInstance()

Returns the module instance that is stored in the database

getStatus()

Returns some basic info

moveTask(task, moveFrom, status, occurrence=None, nocheck=False)

Move a task somewhere

removeRunningTask(task)

Remove a task from the running list

spool(op, obj)

Adds an ‘instruction’ to the spool, in the form (op, obj)

3.6. Example

A simple client use case:

>>> from indico.modules.scheduler import Client
>>> from indico.modules.scheduler.tasks import SampleOneShotTask, SamplePeriodicTask
>>> from datetime import timedelta
>>> from dateutil import rrule
>>> from indico.util.date_time import nowutc
>>> c = Client()
>>> st = SampleOneShotTask(nowutc() + timedelta(seconds=1))
>>> c.enqueue(st)
True
>>> dbi.commit()
>>> pt = SamplePeriodicTask(rrule.MINUTELY, bysecond=(40,))
>>> c.enqueue(pt)
True
>>> dbi.commit()
>>> c.dequeue(pt)
>>> dbi.commit()

A simple scheduler configuration:

s = Scheduler(sleep_interval = 1,
              task_max_tries = 1,
              multitask_mode = 'processes')

3.7. Daemon

class indico.modules.scheduler.Scheduler(**config)

Bases: object

A Scheduler object provides a job scheduler based on a waiting queue, that communicates with its clients through the database. Things have been done in a way that the probability of conflict is minimized, and operations are repeated in case one happens.

The entry point of the process consists of a ‘spooler’ that periodically takes tasks out of a conflict-safe FIFO (spool) and adds them to an IOBTree-based waiting queue. The waiting queue is then checked periodically for the next task, and when the time comes the task is executed.

Tasks are executed in different threads.

The Client class works as a transparent remote proxy for this class.

config is a dictionary containing configuration parameters

run()

Main loop, should only be called from scheduler