service-manager — service manager
service-manager
service-manager manages a set of services, allowing their service processes to be programmatically brought up and down, and providing automatic restart upon failure.
It expects file descriptor 3 to be a (datagram) socket that has been set up to listen for incoming datagrams. This is its main control socket, through which it receives requests to load, unload, and pipe together services from utilities such as service-dt-scanner(1) and system-control(8). It creates individual control FIFOs for each service, through which it receives requests to send signals the service and bring it up or down, from utilities such as service-control(1).
system-manager(8) invokes service-manager with the appropriate socket (which it sets up itself) and output directed to a logging daemon.
So also does session-manager(1).
Alternatively, service-manager can be started by local-datagram-socket-listen(1), which will set up the appropriate socket.
(service-manager can even be started as a "socket-activated" daemon by systemd(1) with the systemd-recommended Accept=false.)
Each service comprises several files in the filesystem, contained in two directories. (system-control(8) builds upon these two with further directories, to construct a service bundle, for the details of which see its manual page.)
A service directory is the current directory in which a service process is
run. It contains:
a run file, which is the executable file for the service process itself;
a start file, which is the executable file to be run when a service is
first brought up (but not when it is automatically restarted);
a restart file, which is the executable file to be run when a service has
ended (to determine whether it should automatically be restarted);
a stop file, which is the executable file to be run when a service is
finally taken down (but not when it is automatically restarted);
Although there is nothing to stop them from being binaries, the executable files are usually scripts interpreted by nosh(1), execlineb(1), or a shell. They set up various parts of the process state (using commands such as softlimit(1), setenv(1), setuidgid(1), and open-controlling-tty(1)) and then chain to the service program proper.
A service directory can also contain:
ancillary files required by the service itself, varying from service to service. For examples:
A tcp-socket-accept(1) service could have an access-control database managed by ucspi-socket-rules-check(1).
Many services have env subdirectories read by envdir(1) in order to control daemon process environment variables.
further files used by other tools.
system-control(1)
creates/deletes and
service-is-enabled(1)
and
service-dt-scanner(1)
look for a down file, for example.
These files are ignored by service-manager.
The service manager does not need write access to the service directory or to any of the executables within it. This permits service directories (as long as the services themselves do not require write access to their service directories) to reside on read-only volumes.
A supervise directory provides the control/status API for the service
supervisor. It contains:
an ok FIFO that does nothing more than signify that the service manager
has loaded the service;
a control FIFO through which commands to control the individual service
process are sent;
a status file that contains a record of the service process ID, start time,
and control state; and
a lock file (compatible with
setlock(1))
that prevents the service manager from re-using an active supervise directory.
The service manager requires read-write access to these files, and write
access to the supervise directory itself, as it creates the files if they
do not exist to start with.
However, it does not require write access to the supervise directory
once the files have been created.
(The
supervise(1)
program in daemontools repeatedly re-creates the status
file, in contrast.)
Control of services and access to service status is thus subject to ordinary permissions and ACLs on these files.
Bernstein's daemontools employs an 18-byte state file.
daemontools has no notion of "starting", "failing", or "stopping" states for
services, and its state file provides only simple binary "up" or "down" state
information.
Guenter's daemontools-encore employs a 20-byte state that
includes extra state information for the aforementioned states.
service-manager uses the same file format.
Other tools may use further files in a supervise directory. Again, these files are ignored by service-manager.
The service manager neither knows nor cares where in the filesystem these directories are. That is the province of the utilities that feed control requests to it. It is not necessary for supervise directories to be subdirectories of service directories.
It is not necessary for the relationship between service directories and supervise directories to be one-to-one. One service directory can be shared amongst multiple services, as long as they each have an individual supervise directory.
Moreover, it is not necessary for the relationship between services themselves to be exactly one "main" service feeding its output into one subordinate "log" service. The service manager permits arbitrary-length pipelines of services, as well as fan-in. (However, fan-in should be used sparingly as it generally causes more administrative headaches than it solves.)
Automatic restart is tailorable to individual services.
If the restart program does not exist, or does not exit with a success
(i.e. zero) status when run, the service run program is not restarted.
For the simplest cases restart can just be a (symbolic) link to /bin/true
or /bin/false, to provide always-restart and never-restart services,
respectively.
(If using the nosh flavours of
true(1)
and
false(1)
do not use links to them.
They will see themselves invoked under the unknown (to them) name
restart and complain.
Instead, write a short
nosh(1)
script.)
However, restart is invoked with two pieces of information, which together
represent the most recent exit status of the run program, that allow
finer control over the restart decision, if desired.
The two pieces of information are its three command line arguments.
The first is a code, one of exit, term, kill, abort, or
crash. This categorizes how run exited. Everything apart from
exit denotes being terminated by an uncaught signal. term denotes
the "good" termination signals SIGTERM, SIGPIPE, SIGHUP, and
SIGINT. kill denotes SIGKILL. abort denotes SIGABRT,
SIGALRM, or SIGQUIT. And crash is everything else.
The second is either (for exit) the decimal exit status of the
process or (for everything else) a symbolic designation (falling back to a
decimal code) of the specific signal, if the first argument is not specific
enough to make a decision.
For convenience, the third is (for other than exit) always the
decimal code of the specific signal.