3. indico.modules.scheduler – task scheduling framework and daemon¶
The scheduler module provides Indico with a scheduling API that allows specific jobs (tasks to be run at given times, with a certain repeatibility, if needed).
3.1. Overview¶
3.1.1. Architecture¶
The scheduler module uses the database as the communication medium between “web server processes” (Indico instances, running inside a web server, etc...) and the daemon. This has advantages such as:
- No need for complex IPC mechanisms (RPC, shared memory, etc...);
- Everything is in the Indico DB, which makes migration much easier;
But also poses some problems such as:
- Overhead introduced by regular DB polling on the scheduler (daemon) side;
- Extra database traffic that can slow down things a bit;
- Increased possibility of database conflicts;
We tried to mitigate these problems by using conflict-free lightweight data structures.
The Scheduler is the element that is responsible for accepting new tasks and prioritizing them by execution time, launching new processes/threads as they need to be executed. Logs of the operations are kept.
A Client is basically a proxy object that allows operations to be performed on the Scheduler and its tasks in a transparent way.
3.1.2. Workflow¶
Tasks can be in one of the following states:
- TASK_STATUS_NONE - Nothing happened yet - this is a transitory state, and normally the state task objects are in when they are created;
- TASK_STATUS_SPOOLED - The task has been added to the spool, and is currently waiting to be put in the waiting queue;
- TASK_STATUS_QUEUED - The algorithm has put the task in the waiting queue;
- TASK_STATUS_RUNNING - The task is being executed;
- TASK_STATUS_FAILED - The task has failed (execution threw an exception, maybe it was cancelled);
- TASK_STATUS_FINISHED - The task has successfully finished;
- TASK_STATUS_TERMINATED - The task has been cancelled by the scheduler (i.e. was AWOL for too long);
...
3.2. Scheduler¶
The main class in the module is the Scheduler
- class indico.modules.scheduler.Scheduler(**config)¶
Bases: object
A Scheduler object provides a job scheduler based on a waiting queue, that communicates with its clients through the database. Things have been done in a way that the probability of conflict is minimized, and operations are repeated in case one happens.
The entry point of the process consists of a ‘spooler’ that periodically takes tasks out of a conflict-safe FIFO (spool) and adds them to an IOBTree-based waiting queue. The waiting queue is then checked periodically for the next task, and when the time comes the task is executed.
Tasks are executed in different threads.
The Client class works as a transparent remote proxy for this class.
config is a dictionary containing configuration parameters
- run()¶
Main loop, should only be called from scheduler
3.3. Client¶
Client applications only need to worry about:
- class indico.modules.scheduler.Client¶
Bases: object
Client provices a transparent scheduler client, that allows Indico client processes to interact with the Scheduler without the need for a lot of code.
It acts as a remote proxy.
- clearSpool()¶
Clears the spool, returning the number of removed elements
- dequeue(task)¶
Schedules a task for deletion
- enqueue(task)¶
Schedules a task for execution
- getSpool()¶
Returns the spool
- getStatus()¶
- Returns status information (dictionary), containing the lengths (tasks) of:
- spool;
- waiting queue;
- running queue;
- finished task index;
- failed task index;
As well as if the scheduler is running (state)
- shutdown(msg='')¶
Shuts down the scheduler. msg is an optional paramater that provides an information message that will be written in the logs
- startFailedTask(task)¶
Starts a failed task
3.4. Tasks¶
- class indico.modules.scheduler.tasks.BaseTask(expiryDate=None)¶
Bases: indico.modules.scheduler.tasks.TimedEvent
A base class for tasks. expiryDate is the last point in time when the task can run. A task will refuse to run if current time is past expiryDate
- prepare()¶
This information will be saved regardless of the task being repeated or not
- reset()¶
Resets a task to its state before being run
- tearDown()¶
If a task needs to do something once it has run and been removed from runningList, overload this method
- class indico.modules.scheduler.tasks.OneShotTask(startDateTime, expiryDate=None)¶
Bases: indico.modules.scheduler.tasks.BaseTask
Tasks that are executed only once
3.5. Module¶
The module object is of little interest for developers in general. Every Indico instance will transparently provide one through getDBInstance().
- class indico.modules.scheduler.SchedulerModule¶
Bases: indico.modules.base.Module
- classmethod getDBInstance()¶
Returns the module instance that is stored in the database
- getStatus()¶
Returns some basic info
- moveTask(task, moveFrom, status, occurrence=None, nocheck=False)¶
Move a task somewhere
- removeRunningTask(task)¶
Remove a task from the running list
- spool(op, obj)¶
Adds an ‘instruction’ to the spool, in the form (op, obj)
3.6. Example¶
A simple client use case:
>>> from indico.modules.scheduler import Client
>>> from indico.modules.scheduler.tasks import SampleOneShotTask, SamplePeriodicTask
>>> from datetime import timedelta
>>> from dateutil import rrule
>>> from indico.util.date_time import nowutc
>>> c = Client()
>>> st = SampleOneShotTask(nowutc() + timedelta(seconds=1))
>>> c.enqueue(st)
True
>>> dbi.commit()
>>> pt = SamplePeriodicTask(rrule.MINUTELY, bysecond=(40,))
>>> c.enqueue(pt)
True
>>> dbi.commit()
>>> c.dequeue(pt)
>>> dbi.commit()
A simple scheduler configuration:
s = Scheduler(sleep_interval = 1,
task_max_tries = 1,
multitask_mode = 'processes')
3.7. Daemon¶
- class indico.modules.scheduler.Scheduler(**config)
Bases: object
A Scheduler object provides a job scheduler based on a waiting queue, that communicates with its clients through the database. Things have been done in a way that the probability of conflict is minimized, and operations are repeated in case one happens.
The entry point of the process consists of a ‘spooler’ that periodically takes tasks out of a conflict-safe FIFO (spool) and adds them to an IOBTree-based waiting queue. The waiting queue is then checked periodically for the next task, and when the time comes the task is executed.
Tasks are executed in different threads.
The Client class works as a transparent remote proxy for this class.
config is a dictionary containing configuration parameters
- run()
Main loop, should only be called from scheduler