.. _kernel-hpc:
Optional HPC support
======================
The optional ZOO-Kernel HPC support gives you the opportunity to use
OGC WPS for invoking remote execution of OTB applications. The current
implementation rely on `OpenSSH `_ and the
`Slurm `_ scheduler.
.. note::
|slurm| `Slurm `_ is an acronym for Simple Linux Utility for Resource Management. Learn more on official `website `__.
For executing an OGC WPS Service using this HPC support, one should
use the OGC WPS version 2.0.0 and asynchornous request. Any tentative
to execute synchronously a HPC service will fail with the message "The
synchronous mode is not supported by this type of service". The
ZOO-Kernel is not the only responsible for the execution and will wait
for the execution on the HPC server to end before being able to
continue its execution. Also, transfering the data from the WPS server
to the cluster and downloading the data produced by the execution will
take time. Hence, when OGC WPS Client will request for GetCapabilities
or DescribeProcess only the "async-execute" mode will be present in
the jobControlOptions attribute for the HPC services.
You can see in the sequence diagram below the interactions between
the OGC WPS Server (ZOO-Kernel), the mail daemon (running on the OGC
WPS server), the Callback Service and, the HPC Server during an
execution of a HPC Service. The dashed lines represent the behavior in
case the optional callback service invocation has been activated. These
invocations are made asynchronously for lowering their impact over the
whole process in case of failure of the callback service for instance.
By now, the callback service is not a WPS service but an independent server.
|hpc_support|
.. |slurm| image:: https://slurm.schedmd.com/slurm_logo.png
:height: 100px
:width: 100px
:scale: 45%
:alt: Slurm logo
.. |hpc_support| image:: ../_static/hpc_schema.svg
:scale: 65%
:alt: HPC Support schema
Installation and configuration
------------------------------
Follow the step described below in order to activate the ZOO-Project optional HPC support.
Prerequisites
.....................
* latest `ZOO-Kernel
`_
trunk version
* `libssh2 `_
* `MapServer `_
* an access to a server with `Slurm `_
and `OrfeoToolBox `_.
Installation steps
...........................
ZOO-Kernel
****************************
Compile ZOO-Kernel using the configuration options as shown below:
.. code-block:: guess
cd zoo-kernel
autoconf
./configure --with-hpc=yes --with-ssh2=/usr --with-mapserver=/usr --with-ms-version=7
make
sudo make install
Optionally, you can ask your ZOO-Kernel to invoke a callback service
which is responsible to record execution history and data
produced. In such a case you can add the ``--with-callback=yes``
option to the configure command.
.. note::
In case you need other languages to be activated, such as Python
for exemple, please use the corresponding option(s).
FinalizeHPC WPS Service
****************************
For being informed that the remote OTB application ends on the
cluster, one should invoke the FinalizeHPC service. It is responsible
to connect using SSH to the HPC server to run an ``sacct`` command for
extracting detailled informations about the sbatch that has been
run. If the ``sacct`` command succeed, and the service is no more
running on the cluster, then the informations are stored in a local
conf file containing a ``[henv]`` section definition, the service
connect to unix domain socket (opened by the ZOO-Kernel that has
initially schedduled the service through Slurm) to inform about the
end of the service run on the cluster. This makes the initial
ZOO-Kernel to continue its execution by downloading output data
produced over the execution of the OTB application on the cluster. So,
this service should be build and deployed on your WPS server. You can
use the following commands to do so.
.. code-block:: guess
cd zoo-service/utils/hpc
make
cp cgi-env/* /usr/lib/cgi-bin
mkdir -p /var/data/xslt/
cp xslt/updateExecute.xsl /var/data/xslt/
You should also copy the
.. note::
FinalizeHPC should be called from a daemon, responsible for reading
mails sent by the cluster to the WPS server.
Configuration steps
...............................
Main configuration file
****************************
When HPC support is activated, you can use different HPC configuration
by adding ``confId`` to your usual ``serviceType=HPC`` in your zcfg
file. For being able to find which configuration a service should
use, the ZOO-Kernel require to know what are the options for creating
the relevant sbatch.
Also, you can define multiple configuration to run the OTB application
on the cluster(s) depending on the size of the inputs. You should
define in the section corresponding to your ``ServiceType`` the
treshold for both raster (``preview_max_pixels``) and vector
(``preview_max_features``) input. In case the raster or the vector
dataset is upper the defined limit, then the ``fullres_conf`` will
be used, in other case the ``preview_conf`` will be.
For each of this configurations, you will have define the parameters
to connect the HPC server, by providing ``ssh_host``, ``ssh_port``,
``ssh_user`` and, ``ssh_key``. Also, you should set where the input
data will be stored on the HPC server, by defining
``remote_data_path`` (the default directory to store data),
``remote_presistent_data_path`` (the directory to store data
considerated as shared data, see below) and, ``remote_work_path`` the
directory used to store the SBATCH script created locally then,
uploaded by the ZOO-Kernel.
Also, there are multiple options you can use to run your applications
using SBATCH. You can define them using ``jobscript_header``,
``jobscript_body`` and ``jobscript_footer`` or by using
``sbatch_options_`` where ```` should be
replaced by a real option name, like ``workdir`` in the following
example. For creating the SBATCH file, the ZOO-Kernel create a file
starting with the content of the file pointed by ``jobscript_header``
(if any, a default header is set in other case), then, any option
defined in ``sbatch_options_*`` and a specific one: ``job-name``,
then, ``jobscript_body`` is added (if any, usually to load required
modules), then the ZOO-Kernel add the invocation of the OTB
application then, optionally the ``jobscript_footer`` is added, if
any.
Finally, ``remote_command_opt`` should contains all the informations
you want to be extracted by the ``sacct`` command run by the
FinalizeHPC service. ``billing_nb_cpu`` is used for billing purpose to
define a cost for using a specific option (preview or fullres).
In addition to the specific ``HPC_`` section and the corresponding
fullres and preview ones, you should define in the ``[security]``
section using the ``shared`` parameter to set the URLs from where the
downloaded data should be considerated as shared, meaning that even if
this ressources requires authentication to be accessed, any
authenticated user will be allowed to access the cache file even if
it was created by somebody else. Also, this shared cache won't contain
any authentication informations in the cache file name as it is
usually the case.
.. code-block:: guess
[HPC_Sample]
preview_max_pixels=820800
preview_max_features=100000
preview_conf=hpc-config-2
fullres_conf=hpc-config-1
[hpc-config-1]
ssh_host=mycluster.org
ssh_port=22
ssh_user=cUser
ssh_key=/var/www/.ssh/id_rsa.pub
remote_data_path=/home/cUser/wps_executions/data
remote_persitent_data_path=/home/cUser/wps_executions/datap
remote_work_path=/home/cUser/wps_executions/script
jobscript_header=/usr/lib/cgi-bin/config-hpc1_header.txt
jobscript_body=/usr/lib/cgi-bin/config-hpc1_body.txt
sbatch_options_workdir=/home/cUser/wps_executions/script
sbatch_substr=Submitted batch job
billing_nb_cpu=1
remote_command_opt=AllocCPUS,AllocGRES,AllocNodes,AllocTRES,Account,AssocID,AveCPU,AveCPUFreq,AveDiskRead,AveDiskWrite,AvePages,AveRSS,AveVMSize,BlockID,Cluster,Comment,ConsumedEnergy,ConsumedEnergyRaw,CPUTime,CPUTimeRAW,DerivedExitCode,Elapsed,Eligible,End,ExitCode,GID,Group,JobID,JobIDRaw,JobName,Layout,MaxDiskRead,MaxDiskReadNode,MaxDiskReadTask,MaxDiskWrite,MaxDiskWriteNode,MaxDiskWriteTask,MaxPages,MaxPagesNode,MaxPagesTask,MaxRSS,MaxRSSNode,MaxRSSTask,MaxVMSize,MaxVMSizeNode,MaxVMSizeTask,MinCPU,MinCPUNode,MinCPUTask,NCPUS,NNodes,NodeList,NTasks,Priority,Partition,QOS,QOSRAW,ReqCPUFreq,ReqCPUFreqMin,ReqCPUFreqMax,ReqCPUFreqGov,ReqCPUS,ReqGRES,ReqMem,ReqNodes,ReqTRES,Reservation,ReservationId,Reserved,ResvCPU,ResvCPURAW,Start,State,Submit,Suspended,SystemCPU,Timelimit,TotalCPU,UID,User,UserCPU,WCKey,WCKeyID
[hpc-config-2]
ssh_host=mycluster.org
ssh_port=22
ssh_user=cUser
ssh_key=/var/www/.ssh/id_rsa.pub
remote_data_path=/home/cUser/wps_executions/data
remote_persitent_data_path=/home/cUser/wps_executions/datap
remote_work_path=/home/cUser/wps_executions/script
jobscript_header=/usr/lib/cgi-bin/config-hpc2_header.txt
jobscript_body=/usr/lib/cgi-bin/config-hpc2_body.txt
sbatch_options_workdir=/home/cUser/wps_executions/script
sbatch_substr=Submitted batch job
billing_nb_cpu=4
remote_command_opt=AllocCPUS,AllocGRES,AllocNodes,AllocTRES,Account,AssocID,AveCPU,AveCPUFreq,AveDiskRead,AveDiskWrite,AvePages,AveRSS,AveVMSize,BlockID,Cluster,Comment,ConsumedEnergy,ConsumedEnergyRaw,CPUTime,CPUTimeRAW,DerivedExitCode,Elapsed,Eligible,End,ExitCode,GID,Group,JobID,JobIDRaw,JobName,Layout,MaxDiskRead,MaxDiskReadNode,MaxDiskReadTask,MaxDiskWrite,MaxDiskWriteNode,MaxDiskWriteTask,MaxPages,MaxPagesNode,MaxPagesTask,MaxRSS,MaxRSSNode,MaxRSSTask,MaxVMSize,MaxVMSizeNode,MaxVMSizeTask,MinCPU,MinCPUNode,MinCPUTask,NCPUS,NNodes,NodeList,NTasks,Priority,Partition,QOS,QOSRAW,ReqCPUFreq,ReqCPUFreqMin,ReqCPUFreqMax,ReqCPUFreqGov,ReqCPUS,ReqGRES,ReqMem,ReqNodes,ReqTRES,Reservation,ReservationId,Reserved,ResvCPU,ResvCPURAW,Start,State,Submit,Suspended,SystemCPU,Timelimit,TotalCPU,UID,User,UserCPU,WCKey,WCKeyID
[security]
attributes=Cookie,Cookies
hosts=*
shared=myhost.net/WCS
You can see below an example of ``jobscript_header`` file.
.. code-block:: guess
#!/bin/sh
#SBATCH --ntasks=1
#SBATCH --ntasks-per-node=1
#SBATCH --exclusive
#SBATCH --distribution=block:block
#SBATCH --partition=partName
#SBATCH --mail-type=END # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=user@wps_server.net # Where to send mail
You can see below an example of ``jobscript_body`` file.
.. code-block:: guess
# Load all the modules
module load cv-standard
module load cmake/3.6.0
module load gcc/4.9.3
module load use.own
module load OTB/6.1-serial-24threads
In casse you have activated the callback service, then you should also
have a ``[callback]`` section, in which you will define ``url`` to
invoke the callback service, ``prohibited`` to list the services that
should not require invocation of the callback sercvice if any and,
``template`` pointing to the local ``updateExecute.xsl`` file used to
replace any inputs provided by value to the reference to the locally
published OGC WFS/WCS web services. This execute request is provided
to the callback service.
.. code-block:: guess
[callback]
url=http://myhost.net:port/callbackUpdate/
prohibited=FinalizeHPC,Xml2Pdf,DeleteData
template=/home/cUser/wps_dir/updateExecute.xsl
OGC WPS Services metadata
****************************
To produce the zcfg files corresponding to the metadata definition of
the WPS services, you can use the otb2zcfg tool to produce them. You
will need to replace ``serviceType=OTB`` by ``serviceType=HPC`` and,
optionally, add one line containing ``confId=HPC_Sample`` for
instance.
Please refer to `otb2zcfg
<./orfeotoolbox.html#services-configuration-file>`_ documentation to
know how to use this tool.
Using the HPC support, when you define one output, there will be
automatically 1 to 3 inner outputs created for the defined output:
download_link
URL to download to generated output
wms_link
URL to access the OGC WMS for this output (only in case
`useMapserver=true`)
wcs_link/wfs_link
URL to access the OGC WCS or WFS for this output (only in case
`useMapserver=true`)
You can see below an example of Output node resulting of the
definition of one output named out and typed as geographic imagery.
.. code-block:: guess
Outputed Image
Image produced by the application
out
Download link
The download link
download_link
WMS link
The WMS link
wms_link
WCS link
The WCS link
wcs_link