SIO Workers¶
The idea behind sioworkers module is that sometimes systems need to
perform some relatively long-term computations. This module provides a set
of convenience classes and functions which can be helpful implementing the
batch tasks themselves. It is not a batch-scheduler.
This mission is accomplished by providing a unified pythonic interface for representing parameters, input and output of batch jobs and running the jobs once these parameters are available.
The environ¶
This mysterious “pythonic interface” is actually a dictionary. Its keys are
strings, and values are Python primitive types, like lists, dictionaries,
strings etc. In practice this may be anything serializable to JSON. This
dictionary is called environ everywhere. The environ is the only
argument passed to sio.workers.runner.run() function and the only thing
returned by it.
Many jobs use the filetracker module, so you may be happier if you
learn about it somewhat.
environ keys common to all jobs¶
Keys that must be present to run a job:
job_type- name of the job to run.
Keys affected by all jobs:
resultSUCCESSif the job finished without throwing an exception,FAILUREotherwise,exception- (set only if an exception was thrown) the exception, converted to string,
traceback- (set only if an exception was thrown) the traceback, converted to string.
Refer to the documentation of a particular job to learn what other
arguments are expected and what information is returned back in
the environ.
In general regular errors which may happen as a result of the job should not be signalled by throwing an exception (for example compilation errors for the compilation job). Exceptions should suggest some potentially important system problems like sandbox misconfiguration or out of disk space.
Running jobs¶
From Python:
-
sio.workers.runner.run(environ)¶ Performs the work passed in
environ.Returns the modified
environ. It might be modified in-place by work implementations.The following keys in
environhave special meaning:job_type- Mandatory key naming the job to be run.
prefilters- Optional list of filter names to apply before performing the work.
postfilters- Optional list of filter names to apply after performing the work.
The following are added during processing:
worker- Hostname of the machine running the job (i.e. the machine executing this function).
Refer to Interacting with Filetracker for more information about filters.
There are also bindings for Celery in
sio.celery.
From the shell, you may use the sio-batch script, which expects an
environment variable environ to be some JSON. After running the job, the
output is printed to the standard output in the following format:
--- BEGIN ENVIRON ---
<jsonified environ>
--- END ENVIRON ---
For developers¶
Hi, developer! Nice to meet you!
Creating jobs¶
Creating jobs ist überleicht.
You just need to define a function with one argument… the environ, returning one
thing… the environ. You may define it in any module, provided that
it is registered with pkg_resources aka setuptools as an entry point,
under the key sio.jobs.
The function may use the current directory in any way — it will be run from inside a temporary directory which will be deleted automatically.
For example, the following setup.py defines a module with a job named
szescblotastop:
from setuptools import setup, find_packages
setup(
name = "mymud",
version = '0.1',
packages = find_packages(),
entry_points = {
'sio.jobs': [
'szescblotastop = mudmodule.mudsubmodule.mud.mud.mud:mud_fun',
]
}
)
Sandboxes¶
-
class
sio.workers.sandbox.Sandbox(name)¶ Represents a sandbox… that is some place in the filesystem when the previously prepared package with some software is extracted (for example compiler, libraries, default output comparator).
Sandbox in our terminology does not mean isolation or security. It is just one directory containing files.
This class deals only with using sandboxes, not creating, changing or uploading them. Each sandbox is uniquely identified by
name. The moment you create the instance ofSandbox, an appropriate archive is downloaded and extracted (if not exists; also a check for newer version is performed). The path to the extracted sandbox is in thepathattribute. This path is valid as long as theSandboxinstance exists (is not garbage collected).Sandbox images are looked up from two places:
- from Filetracker, at path
/sandboxes/<name>, - if not found there, the URL from
SIO_SANDBOXES_URLenvironment variable is used, - if such environment variable is not defined, some default URL is used.
Sandboxes are extracted to the folder named in
SIO_SANDBOXES_BASEDIRenvironment variable (or in~/.sio-sandboxesif the variable is not in the environment).Note
Processes must not modify the content of the extracted sandbox in any way. It is also safe to use the same sandbox by multiple processes concurrently, as the folder is locked to ensure no problems if an upgrade is needed.
Note
Sandboxis a context manager, so it should be used in awithstatement. Upon entering, the sandbox is downloaded, extracted and locked, to prevent other processes from performing an upgrade.Note
Do not constuct instances of this class yourself, use
get_sandbox(). Otherwise you may encounter deadlocks when having twoSandboxinstances of the same name.-
path¶ Contains real, absolute path to sandbox root directory.
-
has_fixup(name)¶ This function check whether the sandbox has applied the fixup with given name.
- from Filetracker, at path
-
sio.workers.sandbox.get_sandbox(name)¶ Constructs a
Sandboxwith the givenname.If a
Sandboxinstance for the givennameis already created, returns that instance.Only this function should be used to creating or getting
Sandboxinstances.
We currently use the following sandboxes:
compiler-gcc.4_8_2.tar.gzThis sandbox contains C and C++ compiler gcc 4.8.2 with all libraries, programs and scripts which are needed for compilation.
compiler-fpc.2_6_2.tar.gzThis sandbox contains Pascal compiler fpc 2.6.2 with all libraries, programs and scripts which are needed for compilation.
exec-sandbox.tar.gzThis sandbox is needed to execute cpu-exec job in safe environment. This sandbox contains only 2 files in one directory called
bin. These files are:compareDefault output comparator program. It is used to compare user’s solution output on certain test with the correct output on that test.
supervisorThis is the program which supervises execution of user’s solution. It provides security. It returns information whether the execution was successful or if there was a runtime error.
vcpu_exec-sandbox.tar.gzThis sandbox is needed to execute vcpu-exec job in safe environment. It contains Pin library and additionally 2 files in
supervisor-bindirectory:supervisorsupervisor.so
This sandbox is used for deterministic cpu instruction counting using OiTimeTool.
sio2jail_exec-sandbox.tar.gzThis sandbox is needed to execute sio2jail-exec job in safe environment. It contains sio2jail binary and minimal box needed for sio2jail. This sandbox is used for deterministic cpu instruction counting using Sio2Jail.
proot-sandbox.tar.gzThis is a sandbox used by
PRootExecutor. It contains Proot software. We useprootto isolate execution to one directory in filesystem.Prootuseschroottechnology from Linux.See
READMEfile in this sandbox for more info.null-sandbox.tar.gzThis sandbox contains only one empty directory. This is example sandbox. This is tar archive (not tar.gz). I don’t know why it has
.tar.gzextension. Probably nobody uses this sandbox and no one has noticed this mistake.
Executors (environment)¶
The executors are environment for executing commands. Just like Sandboxes they are context managers.
-
class
sio.workers.executors.BaseExecutor¶ Base class for Executors: command environment managers.
Its behavior depends on class instance, see its docstring. Objects are callable context managers, so typical usage would be like:
with executor_instance: executor_instance(command, kwargs...)
Most of executors support following options for
__call__method:command- The command to execute — may be a list or a string. If this is a
list, all the arguments will be shell-quoted unless wrapped in
sio.workers.executors.noquote. If this is a string, it will be converted tonoquote-ed one-element list. Command is passed tosubprocess.Popenwithshell=True, but may be manipulated in various ways depending on concrete class. env- The dictionary passed as environment. Non-string values are
automatically converted to strings. If not present, the current
process’ environment is used. In all cases, the environment
is augmented by adding
LC_ALLandLANGUAGEset toen_US.UTF-8. ignore_errors- Do not throw
ExecErrorif the program exits with error extra_ignore_errors- Do not throw
ExecErrorif the program exits with one of the error codes inextra_ignore_errors. stdin- File object which should be redirected to standard input of the program.
stdout,stderr- Could be files opened with
open(fname, 'w'), sys.* or None - then it’s suppressed (which is default). See also:capture_output capture_output- Returns program output in
stdoutkey ofrenv. split_lines- If
True, the output from the called program is returned as a list of lines, otherwise just one big string. forward_stderr- Forwards
stderrtostdout. output_limit- Limits amount of data program can write to stdout, in KiB.
mem_limit- Memory limit (
ulimit -v), in KiB. time_limit- CPU time limit (
ulimit -t), in miliseconds. real_time_limit- Wall clock time limit, in miliseconds.
environ- If present, this should be the
environdictionary. It’s used to extract values formem_limit,time_limit,real_time_limitandoutput_limitfrom it. environ_prefix- Prefix for
mem_limit,time_limit,real_time_limitandoutput_limitkeys inenviron. **kwargs- Other arguments handled by some executors. See their documentation.
The method returns dictionary (called
renv) containing:real_time_used- Wall clock time it took to execute command (in ms).
return_code- Status code that program returned.
stdout- Only when
capture_output=True: output of command
Some executors also returns other keys i.e:
time_used,result_code,mem_used,num_syscalls
-
class
sio.workers.executors.SandboxExecutor(sandbox)¶ SandboxedExecutor is intended to run programs delivered in
sandboxpackage.- This executor accepts following extra arguments in
__call__: use_pathIf false (default) and first argument of command is- relative then it’s prepended with sandbox path.
Note
Sandbox does not mean isolation, it’s just part of filesytem.
-
path¶ Contains real, absolute path to sandbox root.
-
rpath¶ Contains path to sandbox root as visible during command execution.
- This executor accepts following extra arguments in
-
class
sio.workers.executors.PRootExecutor(sandbox)¶ PRootExecutor executor mimics
chrootwithmount --bind.During execution
sandbox.pathbecomes new/. Current working directory is visible as itself and/tmp. Alsosandbox.pathremains accessible undersandbox.path.If sandbox doesn’t contain
/bin/shor/lib, then some basic is bound from proot sandbox.For more information about PRoot see http://proot.me.
PRootExecutor adds support of following arguments in
__call__:proot_optionsOptions passed to proot binary after those- automatically generated.
-
path¶ Contains real, absolute path to sandbox root.
-
rpath¶ Contains path to sandbox root as visible during command execution.
This module provides some ready to user executors which are:
-
class
sio.workers.executors.UnprotectedExecutor¶ Executes command in completely unprotected manner.
Note
time limiting is counted with accuracy of seconds.
-
class
sio.workers.executors.DetailedUnprotectedExecutor¶ This executor returns extended process status (over UnprotectedExecutor.)
Note
It reserves process stderr for time counting, so
stderrarg is ignored.This class adds the following keys to
renv:time_used: Linux user-time used by processresult_code: TLE, OK, RE.result_string: string describingresult_code
-
class
sio.workers.executors.SupervisedExecutor(allow_local_open=False, use_program_return_code=False, **kwargs)¶ Executes program in supervised mode.
Sandboxing limitations may be controlled by passing following arguments to constructor:
allow_local_openAllow opening files within current directory in read-only modeuse_program_return_codeMakes supervisor pass the program return code to renv[‘return_code’] rather than the sandbox return code.Following new arguments are recognized in
__call__:ignore_returnDo not treat non-zero return code as runtime error.java_sandboxSandbox name with JRE.Executed programs may only use stdin/stdout/stderr and manage it’s own memory. Returns extended statistics in
renvcontaining:time_used: processor user time (in ms).mem_used: memory used (in KiB).num_syscall: number of times a syscall has been calledresult_code: short code reporting result of rule obeying. Is one ofOK,RE,TLE,OLE,MLE,RVresult_string: string describingresult_code
-
class
sio.workers.executors.VCPUExecutor¶ Runs program in controlled environment while counting CPU instructions using oitimetool.
Executed programs may only use stdin/stdout/stderr and manage it’s own memory. Returns extended statistics in
renvcontaining:time_used: virtual time based on instruction counting (in ms).mem_used: memory used (in KiB).num_syscall: number of times a syscall has been calledresult_code: short code reporting result of rule obeying. Is one ofOK,RE,TLE,OLE,MLE,RV
result_string: string describingresult_code
-
class
sio.workers.executors.Sio2JailExecutor¶ Runs program in controlled environment while counting CPU instructions using Sio2Jail.
Returns extended statistics in
renvcontaining:time_used: virtual time based on instruction counting (in ms).mem_used: memory used (in KiB).result_code: short code reporting result of rule obeying. Is one ofOK,RE,TLE,MLE,RV
result_string: string describingresult_code
Executing external programs¶
-
sio.workers.execute.execute(command, **kwargs)¶ Wrapper for
sio.workers.executors.UnprotectedExecutorreturning stdout.Returns tuple (return_code, stdout)
Interacting with Filetracker¶
Filetracker should be your friend if you are coding for sio-workers.
We can somewhat help you interacting with it by providing the most
demanded functions in the world:
-
sio.workers.ft.download(environ, key, dest=None, skip_if_exists=False, **kwargs)¶ Downloads the file from
environ[key]and saves it todest.dest- A filename, directory name or
None. In the two latter cases, the file is named the same as inenviron[key]. skip_if_exists- If
Trueanddestpoints to an existing file (not a directory orNone), then the file is not downloaded. **kwargs- Passed directly to
filetracker.Client.get_file().
The value under
environ['use_filetracker']affects downloading in the followins way:- if
True, nothing special happens - if
False, the file is not downloaded from filetracker, but the passed path is assumed to be a regular filesystem path - if
'auto', the file is assumed to be a local filename only if it is a relative path (this is usually the case when developers play).
Returns the path to the saved file.
-
sio.workers.ft.upload(environ, key, source, dest=None, **kwargs)¶ Uploads the file from
sourceto filetracker underenviron[key]name.source- Filename to upload.
dest- A filename, directory name or
None. In the two latter cases, the file is named the same as inenviron[key]. **kwargs- Passed directly to
filetracker.Client.put_file().
See the note about
environ['use_filetracker']insio.workers.ft.download().Returns the filetracker path to the saved file.
-
sio.workers.ft.instance()¶ Returns a singleton instance of
filetracker.Client.
There is also a convenience function for starting the Filetracker server, but this is only useful in complex setups when one wants to configure the worker machines to share cached files between themselves.
-
sio.workers.ft.launch_filetracker_server()¶ Launches the Filetracker server if
FILETRACKER_PUBLIC_URLis present inos.environand the server does not appear to be running.The server is run in the background and the function returns once the server is up and running.
There is also a command-line script called sio-run-filetracker which
calls this function.
Example¶
Here’s an example of a job running the specified binary file
in a controlled environment (beware, as this is not the actual
implementation of the exec job from sio-exec package):
from sio.workers import ft, Failure
from sio.workers.execute import execute, noquote
from sio.workers.sandbox import get_sandbox
def run(environ):
exe_file = ft.download(environ, 'exe_file', 'exe', add_to_cache=True)
os.chmod(exe_file, 0700)
in_file = ft.download(environ, 'in_file', 'in', add_to_cache=True)
sandbox = get_sandbox('exec-sandbox')
env = os.environ.copy()
env['MEM_LIMIT'] = 256000
retcode, output = execute(
[os.path.join(sandbox.path, 'bin', 'supervisor'), '-f', '3',
'./exe',
noquote('<'), 'in',
noquote('3>'), 'supervisor_result',
noquote('>'), 'out'],
env=env)
result_file = open('supervisor_result')
environ['status_line'] = result_file.readline().strip()
result_file.close()
ft.upload(environ, 'out_file', 'out')
return environ
Creating filters¶
Filters are boring. There are no filters at the moment.
Filters are functions with one argument… the environ, returning one
thing… the environ. They may be defined in any modules, provided that
they are registered with pkg_resources aka setuptools as entry points,
under the key sio.workers.filters.
For example, the following setup.py defines a module with a filter:
from setuptools import setup, find_packages
setup(
name = "mypackage",
version = '0.1',
packages = find_packages(),
entry_points = {
'sio.workers.filters': [
'superfilter = mypackage.submodule:superfilter_function',
]
}
)
The ping job¶
There is also a single job called ping available for testing. It expects
an ping key in the environment and and basically does:
environ['pong'] = environ['ping']
Integration with Celery¶
There is also a script sio-celery-worker which starts the Celery daemon
with the default configuration. The configuration is available in
sio.celery.default_config, so a custom celeryconfig.py (for use with a
stock celeryd) may look like this:
from sio.celery.default_config import *
BROKER_URL = 'amqp://foo@bar:server/vhost'