SIO Workers¶
The idea behind sioworkers
module is that sometimes systems need to
perform some relatively long-term computations. This module provides a set
of convenience classes and functions which can be helpful implementing the
batch tasks themselves. It is not a batch-scheduler.
This mission is accomplished by providing a unified pythonic interface for representing parameters, input and output of batch jobs and running the jobs once these parameters are available.
The environ
¶
This mysterious “pythonic interface” is actually a dictionary. Its keys are
strings, and values are Python primitive types, like lists, dictionaries,
strings etc. In practice this may be anything serializable to JSON. This
dictionary is called environ
everywhere. The environ
is the only
argument passed to sio.workers.runner.run()
function and the only thing
returned by it.
Many jobs use the filetracker
module, so you may be happier if you
learn about it somewhat.
environ
keys common to all jobs¶
Keys that must be present to run a job:
job_type
- name of the job to run.
Keys affected by all jobs:
result
SUCCESS
if the job finished without throwing an exception,FAILURE
otherwise,exception
- (set only if an exception was thrown) the exception, converted to string,
traceback
- (set only if an exception was thrown) the traceback, converted to string.
Refer to the documentation of a particular job to learn what other
arguments are expected and what information is returned back in
the environ
.
In general regular errors which may happen as a result of the job should not be signalled by throwing an exception (for example compilation errors for the compilation job). Exceptions should suggest some potentially important system problems like sandbox misconfiguration or out of disk space.
Running jobs¶
From Python:
-
sio.workers.runner.
run
(environ)¶ Performs the work passed in
environ
.Returns the modified
environ
. It might be modified in-place by work implementations.The following keys in
environ
have special meaning:job_type
- Mandatory key naming the job to be run.
prefilters
- Optional list of filter names to apply before performing the work.
postfilters
- Optional list of filter names to apply after performing the work.
The following are added during processing:
worker
- Hostname of the machine running the job (i.e. the machine executing this function).
Refer to Interacting with Filetracker for more information about filters.
There are also bindings for Celery in
sio.celery
.
From the shell, you may use the sio-batch
script, which expects an
environment variable environ
to be some JSON. After running the job, the
output is printed to the standard output in the following format:
--- BEGIN ENVIRON ---
<jsonified environ>
--- END ENVIRON ---
For developers¶
Hi, developer! Nice to meet you!
Creating jobs¶
Creating jobs ist überleicht.
You just need to define a function with one argument… the environ
, returning one
thing… the environ
. You may define it in any module, provided that
it is registered with pkg_resources
aka setuptools
as an entry point,
under the key sio.jobs
.
The function may use the current directory in any way — it will be run from inside a temporary directory which will be deleted automatically.
For example, the following setup.py
defines a module with a job named
szescblotastop
:
from setuptools import setup, find_packages
setup(
name = "mymud",
version = '0.1',
packages = find_packages(),
entry_points = {
'sio.jobs': [
'szescblotastop = mudmodule.mudsubmodule.mud.mud.mud:mud_fun',
]
}
)
Sandboxes¶
-
class
sio.workers.sandbox.
Sandbox
(name)¶ Represents a sandbox… that is some place in the filesystem when the previously prepared package with some software is extracted (for example compiler, libraries, default output comparator).
Sandbox in our terminology does not mean isolation or security. It is just one directory containing files.
This class deals only with using sandboxes, not creating, changing or uploading them. Each sandbox is uniquely identified by
name
. The moment you create the instance ofSandbox
, an appropriate archive is downloaded and extracted (if not exists; also a check for newer version is performed). The path to the extracted sandbox is in thepath
attribute. This path is valid as long as theSandbox
instance exists (is not garbage collected).Sandbox images are looked up from two places:
- from Filetracker, at path
/sandboxes/<name>
, - if not found there, the URL from
SIO_SANDBOXES_URL
environment variable is used, - if such environment variable is not defined, some default URL is used.
Sandboxes are extracted to the folder named in
SIO_SANDBOXES_BASEDIR
environment variable (or in~/.sio-sandboxes
if the variable is not in the environment).Note
Processes must not modify the content of the extracted sandbox in any way. It is also safe to use the same sandbox by multiple processes concurrently, as the folder is locked to ensure no problems if an upgrade is needed.
Note
Sandbox
is a context manager, so it should be used in awith
statement. Upon entering, the sandbox is downloaded, extracted and locked, to prevent other processes from performing an upgrade.Note
Do not constuct instances of this class yourself, use
get_sandbox()
. Otherwise you may encounter deadlocks when having twoSandbox
instances of the same name.-
path
¶ Contains real, absolute path to sandbox root directory.
-
has_fixup
(name)¶ This function check whether the sandbox has applied the fixup with given name.
- from Filetracker, at path
-
sio.workers.sandbox.
get_sandbox
(name)¶ Constructs a
Sandbox
with the givenname
.If a
Sandbox
instance for the givenname
is already created, returns that instance.Only this function should be used to creating or getting
Sandbox
instances.
We currently use the following sandboxes:
compiler-gcc.4_8_2.tar.gz
This sandbox contains C and C++ compiler gcc 4.8.2 with all libraries, programs and scripts which are needed for compilation.
compiler-fpc.2_6_2.tar.gz
This sandbox contains Pascal compiler fpc 2.6.2 with all libraries, programs and scripts which are needed for compilation.
exec-sandbox.tar.gz
This sandbox is needed to execute cpu-exec job in safe environment. This sandbox contains only 2 files in one directory called
bin
. These files are:compare
Default output comparator program. It is used to compare user’s solution output on certain test with the correct output on that test.
supervisor
This is the program which supervises execution of user’s solution. It provides security. It returns information whether the execution was successful or if there was a runtime error.
vcpu_exec-sandbox.tar.gz
This sandbox is needed to execute vcpu-exec job in safe environment. It contains Pin library and additionally 2 files in
supervisor-bin
directory:supervisor
supervisor.so
This sandbox is used for deterministic cpu instruction counting using OiTimeTool.
sio2jail_exec-sandbox.tar.gz
This sandbox is needed to execute sio2jail-exec job in safe environment. It contains sio2jail binary and minimal box needed for sio2jail. This sandbox is used for deterministic cpu instruction counting using Sio2Jail.
proot-sandbox.tar.gz
This is a sandbox used by
PRootExecutor
. It contains Proot software. We useproot
to isolate execution to one directory in filesystem.Proot
useschroot
technology from Linux.See
README
file in this sandbox for more info.null-sandbox.tar.gz
This sandbox contains only one empty directory. This is example sandbox. This is tar archive (not tar.gz). I don’t know why it has
.tar.gz
extension. Probably nobody uses this sandbox and no one has noticed this mistake.
Executors (environment)¶
The executors are environment for executing commands. Just like Sandboxes they are context managers.
-
class
sio.workers.executors.
BaseExecutor
¶ Base class for Executors: command environment managers.
Its behavior depends on class instance, see its docstring. Objects are callable context managers, so typical usage would be like:
with executor_instance: executor_instance(command, kwargs...)
Most of executors support following options for
__call__
method:command
- The command to execute — may be a list or a string. If this is a
list, all the arguments will be shell-quoted unless wrapped in
sio.workers.executors.noquote
. If this is a string, it will be converted tonoquote
-ed one-element list. Command is passed tosubprocess.Popen
withshell=True
, but may be manipulated in various ways depending on concrete class. env
- The dictionary passed as environment. Non-string values are
automatically converted to strings. If not present, the current
process’ environment is used. In all cases, the environment
is augmented by adding
LC_ALL
andLANGUAGE
set toen_US.UTF-8
. ignore_errors
- Do not throw
ExecError
if the program exits with error extra_ignore_errors
- Do not throw
ExecError
if the program exits with one of the error codes inextra_ignore_errors
. stdin
- File object which should be redirected to standard input of the program.
stdout
,stderr
- Could be files opened with
open(fname, 'w')
, sys.* or None - then it’s suppressed (which is default). See also:capture_output
capture_output
- Returns program output in
stdout
key ofrenv
. split_lines
- If
True
, the output from the called program is returned as a list of lines, otherwise just one big string. forward_stderr
- Forwards
stderr
tostdout
. output_limit
- Limits amount of data program can write to stdout, in KiB.
mem_limit
- Memory limit (
ulimit -v
), in KiB. time_limit
- CPU time limit (
ulimit -t
), in miliseconds. real_time_limit
- Wall clock time limit, in miliseconds.
environ
- If present, this should be the
environ
dictionary. It’s used to extract values formem_limit
,time_limit
,real_time_limit
andoutput_limit
from it. environ_prefix
- Prefix for
mem_limit
,time_limit
,real_time_limit
andoutput_limit
keys inenviron
. **kwargs
- Other arguments handled by some executors. See their documentation.
The method returns dictionary (called
renv
) containing:real_time_used
- Wall clock time it took to execute command (in ms).
return_code
- Status code that program returned.
stdout
- Only when
capture_output=True
: output of command
Some executors also returns other keys i.e:
time_used
,result_code
,mem_used
,num_syscalls
-
class
sio.workers.executors.
SandboxExecutor
(sandbox)¶ SandboxedExecutor is intended to run programs delivered in
sandbox
package.- This executor accepts following extra arguments in
__call__
: use_path
If false (default) and first argument of command is- relative then it’s prepended with sandbox path.
Note
Sandbox does not mean isolation, it’s just part of filesytem.
-
path
¶ Contains real, absolute path to sandbox root.
-
rpath
¶ Contains path to sandbox root as visible during command execution.
- This executor accepts following extra arguments in
-
class
sio.workers.executors.
PRootExecutor
(sandbox)¶ PRootExecutor executor mimics
chroot
withmount --bind
.During execution
sandbox.path
becomes new/
. Current working directory is visible as itself and/tmp
. Alsosandbox.path
remains accessible undersandbox.path
.If sandbox doesn’t contain
/bin/sh
or/lib
, then some basic is bound from proot sandbox.For more information about PRoot see http://proot.me.
PRootExecutor adds support of following arguments in
__call__
:proot_options
Options passed to proot binary after those- automatically generated.
-
path
¶ Contains real, absolute path to sandbox root.
-
rpath
¶ Contains path to sandbox root as visible during command execution.
This module provides some ready to user executors which are:
-
class
sio.workers.executors.
UnprotectedExecutor
¶ Executes command in completely unprotected manner.
Note
time limiting is counted with accuracy of seconds.
-
class
sio.workers.executors.
DetailedUnprotectedExecutor
¶ This executor returns extended process status (over UnprotectedExecutor.)
Note
It reserves process stderr for time counting, so
stderr
arg is ignored.This class adds the following keys to
renv
:time_used
: Linux user-time used by processresult_code
: TLE, OK, RE.result_string
: string describingresult_code
-
class
sio.workers.executors.
SupervisedExecutor
(allow_local_open=False, use_program_return_code=False, **kwargs)¶ Executes program in supervised mode.
Sandboxing limitations may be controlled by passing following arguments to constructor:
allow_local_open
Allow opening files within current directory in read-only modeuse_program_return_code
Makes supervisor pass the program return code to renv[‘return_code’] rather than the sandbox return code.Following new arguments are recognized in
__call__
:ignore_return
Do not treat non-zero return code as runtime error.java_sandbox
Sandbox name with JRE.Executed programs may only use stdin/stdout/stderr and manage it’s own memory. Returns extended statistics in
renv
containing:time_used
: processor user time (in ms).mem_used
: memory used (in KiB).num_syscall
: number of times a syscall has been calledresult_code
: short code reporting result of rule obeying. Is one ofOK
,RE
,TLE
,OLE
,MLE
,RV
result_string
: string describingresult_code
-
class
sio.workers.executors.
VCPUExecutor
¶ Runs program in controlled environment while counting CPU instructions using oitimetool.
Executed programs may only use stdin/stdout/stderr and manage it’s own memory. Returns extended statistics in
renv
containing:time_used
: virtual time based on instruction counting (in ms).mem_used
: memory used (in KiB).num_syscall
: number of times a syscall has been calledresult_code
: short code reporting result of rule obeying. Is one ofOK
,RE
,TLE
,OLE
,MLE
,RV
result_string
: string describingresult_code
-
class
sio.workers.executors.
Sio2JailExecutor
¶ Runs program in controlled environment while counting CPU instructions using Sio2Jail.
Returns extended statistics in
renv
containing:time_used
: virtual time based on instruction counting (in ms).mem_used
: memory used (in KiB).result_code
: short code reporting result of rule obeying. Is one ofOK
,RE
,TLE
,MLE
,RV
result_string
: string describingresult_code
Executing external programs¶
-
sio.workers.execute.
execute
(command, **kwargs)¶ Wrapper for
sio.workers.executors.UnprotectedExecutor
returning stdout.Returns tuple (return_code, stdout)
Interacting with Filetracker¶
Filetracker should be your friend if you are coding for sio-workers
.
We can somewhat help you interacting with it by providing the most
demanded functions in the world:
-
sio.workers.ft.
download
(environ, key, dest=None, skip_if_exists=False, **kwargs)¶ Downloads the file from
environ[key]
and saves it todest
.dest
- A filename, directory name or
None
. In the two latter cases, the file is named the same as inenviron[key]
. skip_if_exists
- If
True
anddest
points to an existing file (not a directory orNone
), then the file is not downloaded. **kwargs
- Passed directly to
filetracker.Client.get_file()
.
The value under
environ['use_filetracker']
affects downloading in the followins way:- if
True
, nothing special happens - if
False
, the file is not downloaded from filetracker, but the passed path is assumed to be a regular filesystem path - if
'auto'
, the file is assumed to be a local filename only if it is a relative path (this is usually the case when developers play).
Returns the path to the saved file.
-
sio.workers.ft.
upload
(environ, key, source, dest=None, **kwargs)¶ Uploads the file from
source
to filetracker underenviron[key]
name.source
- Filename to upload.
dest
- A filename, directory name or
None
. In the two latter cases, the file is named the same as inenviron[key]
. **kwargs
- Passed directly to
filetracker.Client.put_file()
.
See the note about
environ['use_filetracker']
insio.workers.ft.download()
.Returns the filetracker path to the saved file.
-
sio.workers.ft.
instance
()¶ Returns a singleton instance of
filetracker.Client
.
There is also a convenience function for starting the Filetracker server, but this is only useful in complex setups when one wants to configure the worker machines to share cached files between themselves.
-
sio.workers.ft.
launch_filetracker_server
()¶ Launches the Filetracker server if
FILETRACKER_PUBLIC_URL
is present inos.environ
and the server does not appear to be running.The server is run in the background and the function returns once the server is up and running.
There is also a command-line script called sio-run-filetracker
which
calls this function.
Example¶
Here’s an example of a job running the specified binary file
in a controlled environment (beware, as this is not the actual
implementation of the exec
job from sio-exec
package):
from sio.workers import ft, Failure
from sio.workers.execute import execute, noquote
from sio.workers.sandbox import get_sandbox
def run(environ):
exe_file = ft.download(environ, 'exe_file', 'exe', add_to_cache=True)
os.chmod(exe_file, 0700)
in_file = ft.download(environ, 'in_file', 'in', add_to_cache=True)
sandbox = get_sandbox('exec-sandbox')
env = os.environ.copy()
env['MEM_LIMIT'] = 256000
retcode, output = execute(
[os.path.join(sandbox.path, 'bin', 'supervisor'), '-f', '3',
'./exe',
noquote('<'), 'in',
noquote('3>'), 'supervisor_result',
noquote('>'), 'out'],
env=env)
result_file = open('supervisor_result')
environ['status_line'] = result_file.readline().strip()
result_file.close()
ft.upload(environ, 'out_file', 'out')
return environ
Creating filters¶
Filters are boring. There are no filters at the moment.
Filters are functions with one argument… the environ
, returning one
thing… the environ
. They may be defined in any modules, provided that
they are registered with pkg_resources
aka setuptools
as entry points,
under the key sio.workers.filters
.
For example, the following setup.py
defines a module with a filter:
from setuptools import setup, find_packages
setup(
name = "mypackage",
version = '0.1',
packages = find_packages(),
entry_points = {
'sio.workers.filters': [
'superfilter = mypackage.submodule:superfilter_function',
]
}
)
The ping
job¶
There is also a single job called ping
available for testing. It expects
an ping
key in the environment and and basically does:
environ['pong'] = environ['ping']
Integration with Celery¶
There is also a script sio-celery-worker
which starts the Celery daemon
with the default configuration. The configuration is available in
sio.celery.default_config
, so a custom celeryconfig.py
(for use with a
stock celeryd
) may look like this:
from sio.celery.default_config import *
BROKER_URL = 'amqp://foo@bar:server/vhost'