SIO Workers

The idea behind sioworkers module is that sometimes systems need to perform some relatively long-term computations. This module provides a set of convenience classes and functions which can be helpful implementing the batch tasks themselves. It is not a batch-scheduler.

This mission is accomplished by providing a unified pythonic interface for representing parameters, input and output of batch jobs and running the jobs once these parameters are available.

The environ

This mysterious “pythonic interface” is actually a dictionary. Its keys are strings, and values are Python primitive types, like lists, dictionaries, strings etc. In practice this may be anything serializable to JSON. This dictionary is called environ everywhere. The environ is the only argument passed to sio.workers.runner.run() function and the only thing returned by it.

Many jobs use the filetracker module, so you may be happier if you learn about it somewhat.

environ keys common to all jobs

Keys that must be present to run a job:

job_type
name of the job to run.

Keys affected by all jobs:

result
SUCCESS if the job finished without throwing an exception, FAILURE otherwise,
exception
(set only if an exception was thrown) the exception, converted to string,
traceback
(set only if an exception was thrown) the traceback, converted to string.

Refer to the documentation of a particular job to learn what other arguments are expected and what information is returned back in the environ.

In general regular errors which may happen as a result of the job should not be signalled by throwing an exception (for example compilation errors for the compilation job). Exceptions should suggest some potentially important system problems like sandbox misconfiguration or out of disk space.

Running jobs

From Python:

sio.workers.runner.run(environ)

Performs the work passed in environ.

Returns the modified environ. It might be modified in-place by work implementations.

The following keys in environ have special meaning:

job_type
Mandatory key naming the job to be run.
prefilters
Optional list of filter names to apply before performing the work.
postfilters
Optional list of filter names to apply after performing the work.

The following are added during processing:

worker
Hostname of the machine running the job (i.e. the machine executing this function).

Refer to Interacting with Filetracker for more information about filters.

There are also bindings for Celery in sio.celery.

From the shell, you may use the sio-batch script, which expects an environment variable environ to be some JSON. After running the job, the output is printed to the standard output in the following format:

--- BEGIN ENVIRON ---
<jsonified environ>
--- END ENVIRON ---

For developers

Hi, developer! Nice to meet you!

Creating jobs

Creating jobs ist überleicht.

You just need to define a function with one argument... the environ, returning one thing... the environ. You may define it in any module, provided that it is registered with pkg_resources aka setuptools as an entry point, under the key sio.jobs.

The function may use the current directory in any way — it will be run from inside a temporary directory which will be deleted automatically.

For example, the following setup.py defines a module with a job named szescblotastop:

from setuptools import setup, find_packages
setup(
    name = "mymud",
    version = '0.1',
    packages = find_packages(),
    entry_points = {
        'sio.jobs': [
            'szescblotastop = mudmodule.mudsubmodule.mud.mud.mud:mud_fun',
        ]
    }
)

Sandboxes

class sio.workers.sandbox.Sandbox(name)

Represents a sandbox... that is some place in the filesystem when the previously prepared package with some software is extracted.

This class deals only with using sandboxes, not creating, changing or uploading them. Each sandbox is uniquely identified by name. The moment you create the instance of Sandbox, an appropriate archive is downloaded and extracted (if not exists; also a check for newer version is performed). The path to the extracted sandbox is in the path attribute. This path is valid as long as the Sandbox instance exists (is not garbage collected).

Sandbox images are looked up from two places:

  • from Filetracker, at path /sandboxes/<name>,
  • if not found there, the URL from SIO_SANDBOXES_URL environment variable is used,
  • if such environment variable is not defined, some default URL is used.

Sandboxes are extracted to the folder named in SIO_SANDBOXES_BASEDIR environment variable (or in ~/.sio-sandboxes if the variable is not in the environment).

Note

Processes must not modify the content of the extracted sandbox in any way. It is also safe to use the same sandbox by multiple processes concurrently, as the folder is locked to ensure no problems if an upgrade is needed.

Note

Sandbox is a context manager, so it should be used in a with statement. Upon entering, the sandbox is downloaded, extracted and locked, to prevent other processes from performing an upgrade.

Note

Do not constuct instances of this class yourself, use get_sandbox(). Otherwise you may encounter deadlocks when having two Sandbox instances of the same name.

Executors (environment)

The executors are environment for executing commands. Just like Sandboxes they are context managers.

class sio.workers.executors.BaseExecutor

Base class for Executors: command environment managers.

Its behavior depends on class instance, see its docstring. Objects are callable context managers, so typical usage would be like:

with executor_instance:
    executor_instance(command, kwargs...)

Most of executors support following options for __call__ method:

command
The command to execute — may be a list or a string. If this is a list, all the arguments will be shell-quoted unless wrapped in sio.workers.executors.noquote. If this is a string, it will be converted to noquote-ed one-element list. Command is passed to subprocess.Popen with shell=True, but may be manipulated in various ways depending on concrete class.
env
The dictionary passed as environment. Non-string values are automatically converted to strings. If not present, the current process’ environment is used. In all cases, the environment is augmented by adding LC_ALL and LANGUAGE set to en_US.UTF-8.
ignore_errors
Do not throw ExecError if the program exits with error
extra_ignore_errors
Do not throw ExecError if the program exits with one of the error codes in extra_ignore_errors.
stdin
File object which should be redirected to standard input of the program.
stdout, stderr
Could be files opened with open(fname, 'w'), sys.* or None - then it’s suppressed (which is default). See also: capture_output
capture_output
Returns program output in stdout key of renv.
split_lines
If True, the output from the called program is returned as a list of lines, otherwise just one big string.
forward_stderr
Forwards stderr to stdout.
output_limit
Limits amount of data program can write to stdout, in KiB.
mem_limit
Memory limit (ulimit -v), in KiB.
time_limit
CPU time limit (ulimit -t), in miliseconds.
real_time_limit
Wall clock time limit, in miliseconds.
environ
If present, this should be the environ dictionary. It’s used to extract values for mem_limit, time_limit, real_time_limit and output_limit from it.
environ_prefix
Prefix for mem_limit, time_limit, real_time_limit and output_limit keys in environ.
**kwargs
Other arguments handled by some executors. See their documentation.

The method returns dictionary (called renv) containing:

real_time_used
Wall clock time it took to execute command (in ms).
return_code
Status code that program returned.
stdout
Only when capture_output=True: output of command

Some executors also returns other keys i.e: time_used, result_code, mem_used, num_syscalls

class sio.workers.executors.SandboxExecutor(sandbox)

SandboxedExecutor is intended to run programs delivered in sandbox package.

This executor accepts following extra arguments in __call__:
use_path If false (default) and first argument of command is
relative then it’s prepended with sandbox path.

Note

Sandbox does not mean isolation, it’s just part of filesytem.

path

Contains real, absolute path to sandbox root.

rpath

Contains path to sandbox root as visible during command execution.

class sio.workers.executors.PRootExecutor(sandbox)

PRootExecutor executor mimics chroot with mount --bind.

During execution sandbox.path becomes new /. Current working directory is visible as itself and /tmp. Also sandbox.path remains accessible under sandbox.path.

If sandbox doesn’t contain /bin/sh or /lib, then some basic is bound from proot sandbox.

For more information about PRoot see http://proot.me.

PRootExecutor adds support of following arguments in __call__:

proot_options Options passed to proot binary after those
automatically generated.
path

Contains real, absolute path to sandbox root.

rpath

Contains path to sandbox root as visible during command execution.

This module provides some ready to user executors which are:

class sio.workers.executors.UnprotectedExecutor

Executes command in completely unprotected manner.

Note

time limiting is counted with accuracy of seconds.

class sio.workers.executors.DetailedUnprotectedExecutor

This executor returns extended process status (over UnprotectedExecutor.)

Note

It reserves process stderr for time counting, so stderr arg is ignored.

This class adds the following keys to renv:

time_used: Linux user-time used by process

result_code: TLE, OK, RE.

result_string: string describing result_code

class sio.workers.executors.SupervisedExecutor(allow_local_open=False, use_program_return_code=False, **kwargs)

Executes program in supervised mode.

Sandboxing limitations may be controlled by passing following arguments to constructor:

allow_local_open Allow opening files within current directory in read-only mode

use_program_return_code Makes supervisor pass the program return code to renv[‘return_code’] rather than the sandbox return code.

Following new arguments are recognized in __call__:

ignore_return Do not treat non-zero return code as runtime error.

java_sandbox Sandbox name with JRE.

Executed programs may only use stdin/stdout/stderr and manage it’s own memory. Returns extended statistics in renv containing:

time_used: processor user time (in ms).

mem_used: memory used (in KiB).

num_syscall: number of times a syscall has been called

result_code: short code reporting result of rule obeying. Is one of OK, RE, TLE, OLE, MLE, RV

result_string: string describing result_code

class sio.workers.executors.VCPUExecutor

Runs program in controlled environment while counting CPU instructions.

Executed programs may only use stdin/stdout/stderr and manage it’s own memory. Returns extended statistics in renv containing:

time_used: time based on instruction counting (in ms).

mem_used: memory used (in KiB).

num_syscall: number of times a syscall has been called

result_code: short code reporting result of rule obeying. Is one of OK, RE, TLE, OLE, MLE, RV

result_string: string describing result_code

Executing external programs

sio.workers.execute.execute(command, **kwargs)

Wrapper for sio.workers.executors.UnprotectedExecutor returning stdout.

Returns tuple (return_code, stdout)

Interacting with Filetracker

Filetracker should be your friend if you are coding for sio-workers. We can somewhat help you interacting with it by providing the most demanded functions in the world:

sio.workers.ft.download(environ, key, dest=None, skip_if_exists=False, **kwargs)

Downloads the file from environ[key] and saves it to dest.

dest
A filename, directory name or None. In the two latter cases, the file is named the same as in environ[key].
skip_if_exists
If True and dest points to an existing file (not a directory or None), then the file is not downloaded.
**kwargs
Passed directly to filetracker.Client.get_file().

The value under environ['use_filetracker'] affects downloading in the followins way:

  • if True, nothing special happens
  • if False, the file is not downloaded from filetracker, but the passed path is assumed to be a regular filesystem path
  • if 'auto', the file is assumed to be a local filename only if it is a relative path (this is usually the case when developers play).

Returns the path to the saved file.

sio.workers.ft.upload(environ, key, source, dest=None, **kwargs)

Uploads the file from source to filetracker under environ[key] name.

source
Filename to upload.
dest
A filename, directory name or None. In the two latter cases, the file is named the same as in environ[key].
**kwargs
Passed directly to filetracker.Client.put_file().

See the note about environ['use_filetracker'] in sio.workers.ft.download().

Returns the filetracker path to the saved file.

sio.workers.ft.instance()

Returns a singleton instance of filetracker.Client.

There is also a convenience function for starting the Filetracker server, but this is only useful in complex setups when one wants to configure the worker machines to share cached files between themselves.

sio.workers.ft.launch_filetracker_server()

Launches the Filetracker server if FILETRACKER_PUBLIC_URL is present in os.environ and the server does not appear to be running.

The server is run in the background and the function returns once the server is up and running.

There is also a command-line script called sio-run-filetracker which calls this function.

Example

Here’s an example of a job running the specified binary file in a controlled environment (beware, as this is not the actual implementation of the exec job from sio-exec package):

from sio.workers import ft, Failure
from sio.workers.execute import execute, noquote
from sio.workers.sandbox import get_sandbox

def run(environ):
    exe_file = ft.download(environ, 'exe_file', 'exe', add_to_cache=True)
    os.chmod(exe_file, 0700)
    in_file = ft.download(environ, 'in_file', 'in', add_to_cache=True)
    sandbox = get_sandbox('exec-sandbox')
    env = os.environ.copy()
    env['MEM_LIMIT'] = 256000
    retcode, output = execute(
            [os.path.join(sandbox.path, 'bin', 'supervisor'), '-f', '3',
                './exe',
                noquote('<'), 'in',
                noquote('3>'), 'supervisor_result',
                noquote('>'), 'out'],
            env=env)
    result_file = open('supervisor_result')
    environ['status_line'] = result_file.readline().strip()
    result_file.close()
    ft.upload(environ, 'out_file', 'out')
    return environ

Creating filters

Filters are boring. There are no filters at the moment.

Filters are functions with one argument... the environ, returning one thing... the environ. They may be defined in any modules, provided that they are registered with pkg_resources aka setuptools as entry points, under the key sio.workers.filters.

For example, the following setup.py defines a module with a filter:

from setuptools import setup, find_packages
setup(
    name = "mypackage",
    version = '0.1',
    packages = find_packages(),
    entry_points = {
        'sio.workers.filters': [
            'superfilter = mypackage.submodule:superfilter_function',
        ]
    }
)

The ping job

There is also a single job called ping available for testing. It expects an ping key in the environment and and basically does:

environ['pong'] = environ['ping']

Integration with Celery

There is also a script sio-celery-worker which starts the Celery daemon with the default configuration. The configuration is available in sio.celery.default_config, so a custom celeryconfig.py (for use with a stock celeryd) may look like this:

from sio.celery.default_config import *
BROKER_URL = 'amqp://foo@bar:server/vhost'

Available jobs

Indices and tables