source: trunk/docs/kernel/hpc.rst @ 976

Last change on this file since 976 was 917, checked in by djay, 6 years ago

Merge prototype-v0 branch in trunk

  • Property svn:keywords set to Id
File size: 14.0 KB
RevLine 
[906]1.. _kernel-hpc:
2   
3Optional HPC support
4======================
5
6The optional ZOO-Kernel HPC support gives you the opportunity to use
7OGC WPS for invoking remote execution of OTB applications. The current
8implementation rely on `OpenSSH <https://www.openssh.com/>`_ and the
9`Slurm <http://slurm.schedmd.com/>`_  scheduler.
10
11.. note:: 
12
13   |slurm| `Slurm <http://slurm.schedmd.com/>`_ is an acronym for Simple Linux Utility for Resource Management. Learn more on official `website <https://slurm.schedmd.com/overview.html>`__.
14
15For executing an OGC WPS Service using this HPC support, one should
16use the OGC WPS version 2.0.0 and asynchornous request. Any tentative
17to execute synchronously a HPC service will fail with the message "The
18synchronous mode is not supported by this type of service". The
19ZOO-Kernel is not the only responsible for the execution and will wait
20for the execution on the HPC server to end before being able to
21continue its execution. Also, transfering the data from the WPS server
22to the cluster and downloading the data produced by the execution will
23take time. Hence, when OGC WPS Client will request for GetCapabilities
24or DescribeProcess only the "async-execute" mode will be present in
25the jobControlOptions attribute for the HPC services.
26
27You can see in the sequence diagram below the interactions between
28the OGC WPS Server (ZOO-Kernel), the mail daemon (running on the OGC
29WPS server), the Callback Service and, the HPC Server during an
30execution of a HPC Service. The dashed lines represent the behavior in
31case the optional callback service invocation has been activated. These
32invocations are made asynchronously for lowering their impact over the
33whole process in case of failure of the callback service for instance.
34
35By now, the callback service is not a WPS service but an independent server.
36
37|hpc_support|
38
39.. |slurm| image:: https://slurm.schedmd.com/slurm_logo.png
40       :height: 100px
41       :width: 100px
42       :scale: 45%
43       :alt: Slurm logo
44
45.. |hpc_support| image:: ../_static/hpc_schema.svg
46       :scale: 65%
47       :alt: HPC Support schema
48
49
50Installation and configuration
51------------------------------
52
53Follow the step described below in order to activate the ZOO-Project optional HPC support.
54
55Prerequisites
56.....................
57
58   * latest `ZOO-Kernel
59     <http://zoo-project.org/trac/browser/trunk/zoo-project/zoo-kernel>`_
60     trunk version
61   * `libssh2 <https://www.libssh2.org/>`_
62   * `MapServer <http://www.mapserver.org>`_
63   * an access to a server with `Slurm  <http://slurm.schedmd.com>`_
64     and `OrfeoToolBox <https://www.orfeo-toolbox.org>`_.
65
66Installation steps
67...........................
68
69ZOO-Kernel
70****************************
71
72Compile ZOO-Kernel using the configuration options as shown below:
73
74.. code-block:: guess
75
76     cd zoo-kernel
77     autoconf
78     ./configure  --with-hpc=yes --with-ssh2=/usr --with-mapserver=/usr --with-ms-version=7
79     make
80     sudo make install
81
82Optionally, you can ask your ZOO-Kernel to invoke a callback service
83which is responsible to record execution history and data
84produced. In such a case you can add the ``--with-callback=yes``
85option to the configure command.
86
87.. note::
88
89   In case you need other languages to be activated, such as Python
90   for exemple, please use the corresponding option(s).
91
92FinalizeHPC WPS Service
93****************************
94
95For being informed that the remote OTB application ends on the
96cluster, one should invoke the FinalizeHPC service. It is responsible
97to connect using SSH to the HPC server to run an ``sacct`` command for
98extracting detailled informations about the sbatch that has been
99run. If the ``sacct`` command succeed, and the service is no more
100running on the cluster, then the informations are stored in a local
101conf file containing a ``[henv]`` section definition, the service
102connect to unix domain socket (opened by the ZOO-Kernel that has
103initially schedduled the service through Slurm) to inform about the
104end of the service run on the cluster. This makes the initial
105ZOO-Kernel to continue its execution by downloading output data
106produced over the execution of the OTB application on the cluster. So,
107this service should be build and deployed on your WPS server. You can
108use the following commands to do so.
109
110.. code-block:: guess
111
112     cd zoo-service/utils/hpc
113     make
114     cp cgi-env/* /usr/lib/cgi-bin
115     mkdir -p /var/data/xslt/
116     cp xslt/updateExecute.xsl /var/data/xslt/
117     
118You should also copy the
119.. note::
120
121   FinalizeHPC should be called from a daemon, responsible for reading
122   mails sent by the cluster to the WPS server.
123
124
125Configuration steps
126...............................
127
128Main configuration file
129****************************
130
131When HPC support is activated, you can use different HPC configuration
132by adding ``confId`` to your usual ``serviceType=HPC`` in your zcfg
133file. For being able to find which configuration a service should
134use, the ZOO-Kernel require to know what are the options for creating
135the relevant sbatch.
136
137Also, you can define multiple configuration to run the OTB application
138on the cluster(s) depending on the size of the inputs. You should
139define in the section corresponding to your ``ServiceType`` the
140treshold for both raster (``preview_max_pixels``) and vector
141(``preview_max_features``) input. In case the raster or the vector
142dataset is upper the defined limit, then the ``fullres_conf`` will
143be used, in other case the ``preview_conf`` will be.
144
145For each of this configurations, you will have define the parameters
146to connect the HPC server, by providing ``ssh_host``, ``ssh_port``,
147``ssh_user`` and, ``ssh_key``. Also, you should set where the input
148data will be stored on the HPC server, by defining
149``remote_data_path`` (the default directory to store data),
150``remote_presistent_data_path`` (the directory to store data
151considerated as shared data, see below)  and, ``remote_work_path`` the
152directory used to store the SBATCH script created locally then,
153uploaded by the ZOO-Kernel.
154
155Also, there are multiple options you can use to run your applications
156using  SBATCH. You can define them using ``jobscript_header``,
157``jobscript_body`` and ``jobscript_footer`` or by using
158``sbatch_options_<SBATCH_OPTION>`` where ``<SBATCH_OPTION>`` should be
159replaced by a real option name, like ``workdir`` in the following
160example. For creating the SBATCH file, the ZOO-Kernel create a file
161starting with the content of the file pointed by ``jobscript_header``
162(if any, a default header is set in other case), then, any option
163defined in ``sbatch_options_*`` and a specific one: ``job-name``,
164then, ``jobscript_body`` is added (if any, usually to load required
165modules), then the ZOO-Kernel add the invocation of the OTB
166application then, optionally the ``jobscript_footer`` is added, if
167any.
168
169Finally, ``remote_command_opt`` should contains all the informations
170you want to be extracted by the ``sacct`` command run by the
171FinalizeHPC service. ``billing_nb_cpu`` is used for billing purpose to
172define a cost for using a specific option (preview or fullres).
173
174In addition to the specific ``HPC_<ID>`` section and the corresponding
175fullres and preview ones, you should define in the ``[security]``
176section using the ``shared`` parameter to set the URLs from where the
177downloaded data should be considerated as shared, meaning that even if
178this ressources requires authentication to be accessed, any
179authenticated user will be allowed to access the cache file even if
180it was created by somebody else. Also, this shared cache won't contain
181any authentication informations in the cache file name as it is
182usually the case.
183
184.. code-block:: guess
185
186     [HPC_Sample]
187     preview_max_pixels=820800
188     preview_max_features=100000
189     preview_conf=hpc-config-2
190     fullres_conf=hpc-config-1
191     
192     [hpc-config-1]
193     ssh_host=mycluster.org
194     ssh_port=22
195     ssh_user=cUser
196     ssh_key=/var/www/.ssh/id_rsa.pub
197     remote_data_path=/home/cUser/wps_executions/data
198     remote_persitent_data_path=/home/cUser/wps_executions/datap
199     remote_work_path=/home/cUser/wps_executions/script
200     jobscript_header=/usr/lib/cgi-bin/config-hpc1_header.txt
201     jobscript_body=/usr/lib/cgi-bin/config-hpc1_body.txt
202     sbatch_options_workdir=/home/cUser/wps_executions/script
203     sbatch_substr=Submitted batch job
204     billing_nb_cpu=1
205     remote_command_opt=AllocCPUS,AllocGRES,AllocNodes,AllocTRES,Account,AssocID,AveCPU,AveCPUFreq,AveDiskRead,AveDiskWrite,AvePages,AveRSS,AveVMSize,BlockID,Cluster,Comment,ConsumedEnergy,ConsumedEnergyRaw,CPUTime,CPUTimeRAW,DerivedExitCode,Elapsed,Eligible,End,ExitCode,GID,Group,JobID,JobIDRaw,JobName,Layout,MaxDiskRead,MaxDiskReadNode,MaxDiskReadTask,MaxDiskWrite,MaxDiskWriteNode,MaxDiskWriteTask,MaxPages,MaxPagesNode,MaxPagesTask,MaxRSS,MaxRSSNode,MaxRSSTask,MaxVMSize,MaxVMSizeNode,MaxVMSizeTask,MinCPU,MinCPUNode,MinCPUTask,NCPUS,NNodes,NodeList,NTasks,Priority,Partition,QOS,QOSRAW,ReqCPUFreq,ReqCPUFreqMin,ReqCPUFreqMax,ReqCPUFreqGov,ReqCPUS,ReqGRES,ReqMem,ReqNodes,ReqTRES,Reservation,ReservationId,Reserved,ResvCPU,ResvCPURAW,Start,State,Submit,Suspended,SystemCPU,Timelimit,TotalCPU,UID,User,UserCPU,WCKey,WCKeyID
206     
207     [hpc-config-2]
208     ssh_host=mycluster.org
209     ssh_port=22
210     ssh_user=cUser
211     ssh_key=/var/www/.ssh/id_rsa.pub
212     remote_data_path=/home/cUser/wps_executions/data
213     remote_persitent_data_path=/home/cUser/wps_executions/datap
214     remote_work_path=/home/cUser/wps_executions/script
215     jobscript_header=/usr/lib/cgi-bin/config-hpc2_header.txt
216     jobscript_body=/usr/lib/cgi-bin/config-hpc2_body.txt
217     sbatch_options_workdir=/home/cUser/wps_executions/script
218     sbatch_substr=Submitted batch job
219     billing_nb_cpu=4
220     remote_command_opt=AllocCPUS,AllocGRES,AllocNodes,AllocTRES,Account,AssocID,AveCPU,AveCPUFreq,AveDiskRead,AveDiskWrite,AvePages,AveRSS,AveVMSize,BlockID,Cluster,Comment,ConsumedEnergy,ConsumedEnergyRaw,CPUTime,CPUTimeRAW,DerivedExitCode,Elapsed,Eligible,End,ExitCode,GID,Group,JobID,JobIDRaw,JobName,Layout,MaxDiskRead,MaxDiskReadNode,MaxDiskReadTask,MaxDiskWrite,MaxDiskWriteNode,MaxDiskWriteTask,MaxPages,MaxPagesNode,MaxPagesTask,MaxRSS,MaxRSSNode,MaxRSSTask,MaxVMSize,MaxVMSizeNode,MaxVMSizeTask,MinCPU,MinCPUNode,MinCPUTask,NCPUS,NNodes,NodeList,NTasks,Priority,Partition,QOS,QOSRAW,ReqCPUFreq,ReqCPUFreqMin,ReqCPUFreqMax,ReqCPUFreqGov,ReqCPUS,ReqGRES,ReqMem,ReqNodes,ReqTRES,Reservation,ReservationId,Reserved,ResvCPU,ResvCPURAW,Start,State,Submit,Suspended,SystemCPU,Timelimit,TotalCPU,UID,User,UserCPU,WCKey,WCKeyID
221     
222     [security]
223     attributes=Cookie,Cookies
224     hosts=*
225     shared=myhost.net/WCS
226
227You can see below an example of ``jobscript_header`` file.
228
229.. code-block:: guess
230
231     #!/bin/sh
232     #SBATCH --ntasks=1
233     #SBATCH --ntasks-per-node=1
234     #SBATCH --exclusive
235     #SBATCH --distribution=block:block
236     #SBATCH --partition=partName
237     #SBATCH --mail-type=END              # Mail events (NONE, BEGIN, END, FAIL, ALL)
238     #SBATCH --mail-user=user@wps_server.net   # Where to send mail
239
240You can see below an example of ``jobscript_body`` file.
241
242
243.. code-block:: guess
244
245     # Load all the modules
246     module load cv-standard
247     module load cmake/3.6.0
248     module load gcc/4.9.3
249     module load use.own
250     module load OTB/6.1-serial-24threads
251
252In casse you have activated the callback service, then you should also
253have a ``[callback]`` section, in which you will define ``url`` to
254invoke the callback service, ``prohibited`` to list the services that
255should not require invocation of the callback sercvice if any and,
256``template`` pointing to the local ``updateExecute.xsl`` file used to
257replace any inputs provided by value to the reference to the locally
258published OGC WFS/WCS web services. This execute request is provided
259to the callback service.
260
261.. code-block:: guess
262
263     [callback]
264     url=http://myhost.net:port/callbackUpdate/
265     prohibited=FinalizeHPC,Xml2Pdf,DeleteData
266     template=/home/cUser/wps_dir/updateExecute.xsl
267
268
269OGC WPS Services metadata
270****************************
271
272To produce the zcfg files corresponding to the metadata definition of
273the WPS services, you can use the otb2zcfg tool to produce them. You
274will need to replace ``serviceType=OTB`` by ``serviceType=HPC`` and,
275optionally, add one line containing ``confId=HPC_Sample`` for
276instance.
277
278Please refer to `otb2zcfg
279<./orfeotoolbox.html#services-configuration-file>`_ documentation to
280know how to use this tool.
281
282Using the HPC support, when you define one output, there will be
283automatically 1 to 3 inner outputs created for the defined output:
284
285download_link
286   URL to download to generated output
287
288wms_link
289   URL to access the OGC WMS for this output (only in case
290   `useMapserver=true`)
291
292wcs_link/wfs_link
293   URL to access the OGC WCS or WFS for this output (only in case
294   `useMapserver=true`)
295
296You can see below an example of Output node resulting of the
297definition of one output named out and typed as geographic imagery.
298
299
300.. code-block:: guess
301
302      <wps:Output>
303        <ows:Title>Outputed Image</ows:Title>
304        <ows:Abstract>Image produced by the application</ows:Abstract>
305        <ows:Identifier>out</ows:Identifier>
306        <wps:Output>
307          <ows:Title>Download link</ows:Title>
308          <ows:Abstract>The download link</ows:Abstract>
309          <ows:Identifier>download_link</ows:Identifier>
310          <wps:ComplexData>
311            <wps:Format default="true" mimeType="image/tiff"/>
312            <wps:Format mimeType="image/tiff"/>
313          </wps:ComplexData>
314        </wps:Output>
315        <wps:Output>
316          <ows:Title>WMS link</ows:Title>
317          <ows:Abstract>The WMS link</ows:Abstract>
318          <ows:Identifier>wms_link</ows:Identifier>
319          <wps:ComplexData>
320            <wps:Format default="true" mimeType="image/tiff"/>
321            <wps:Format mimeType="image/tiff"/>
322          </wps:ComplexData>
323        </wps:Output>
324        <wps:Output>
325          <ows:Title>WCS link</ows:Title>
326          <ows:Abstract>The WCS link</ows:Abstract>
327          <ows:Identifier>wcs_link</ows:Identifier>
328          <wps:ComplexData>
329            <wps:Format default="true" mimeType="image/tiff"/>
330            <wps:Format mimeType="image/tiff"/>
331          </wps:ComplexData>
332        </wps:Output>
333      </wps:Output>
334
335   
336
337
338
339
Note: See TracBrowser for help on using the repository browser.

Search

Context Navigation

ZOO Sponsors

http://www.zoo-project.org/trac/chrome/site/img/geolabs-logo.pnghttp://www.zoo-project.org/trac/chrome/site/img/neogeo-logo.png http://www.zoo-project.org/trac/chrome/site/img/apptech-logo.png http://www.zoo-project.org/trac/chrome/site/img/3liz-logo.png http://www.zoo-project.org/trac/chrome/site/img/gateway-logo.png

Become a sponsor !

Knowledge partners

http://www.zoo-project.org/trac/chrome/site/img/ocu-logo.png http://www.zoo-project.org/trac/chrome/site/img/gucas-logo.png http://www.zoo-project.org/trac/chrome/site/img/polimi-logo.png http://www.zoo-project.org/trac/chrome/site/img/fem-logo.png http://www.zoo-project.org/trac/chrome/site/img/supsi-logo.png http://www.zoo-project.org/trac/chrome/site/img/cumtb-logo.png

Become a knowledge partner

Related links

http://zoo-project.org/img/ogclogo.png http://zoo-project.org/img/osgeologo.png