Vac configuration for GridPP DIRAC
This page explains how to run GridPP DIRAC Service virtual machines on Vac factory machines. Definitions for the following sub-VOs of the service are provided: cernatschool.org, comet.j-parc.jp, dune, gridpp, hyperk.org, lsst, lz, magrid, mice, na62.vo.gridpp.ac.uk, pheno, skatelescope.eu, snoplus.snolab.ca, solidexperiment.org, t2k.org, vo.moedal.org, vo.northgrid.ac.uk, vo.scotgrid.ac.uk
Please see the Vac website for Vac's Admin Guide and man pages, which explain how to install and configure Vac itself and get a working Vac factory. These instructions are based on Vac 3.0 or above.
Before configuring Vac for the GridPP DIRAC Service, you need to follow these steps:
- When you configure Vac, you need to choose a Vac space name. This will be used as the Computing Element (CE) name in DIRAC, and are equivalent to the CEs of ARC or CREAM.
- One or more CE's are grouped together to form a site, which will take the form VAC.Example.cc where Example is derived from your institutional name and cc is the country code. e.g. VAC.CERN-PROD.ch or VAC.UKI-NORTHGRID-MAN-HEP.uk. Site names are allocated and registered in the Dirac configuration service by the GridPP DIRAC service admins. Vac site names for UK sites are VAC.GOCDB_SITENAME.uk.
- Obtain a host certificate which the VMs can use as a client certificate to fetch work from the central DIRAC task queue. One certificate can be used for all GridPP DIRAC VMs at a site. You should normally use a name which is specific to GridPP but is part of your site's DNS space. It doesn't need to correspond to a real host or really exist as an entry on your DNS servers: just that you are entitled to register it. So if your site's domain name is example.cc then a certificate for gds-vm.example.cc with a DN like /C=CC/O=XYZ/CN=gds-vm.example.cc would be a good choice.
- Place the hostcert.pem and hostkey.pem of the certificate in the files subdirectory of the gds (or similar) subdirectory of /var/lib/vac/machinetypes . So /var/lib/vac/machinetypes/gds/files/hostcert.pem and /var/lib/vac/machinetypes/gds/files/hostkey.pem
- Contact one of the DIRAC service admins (ie lcg-site-admin AT imperial.ac.uk) to agree a site name and to register your CE, Site, and certificate DN in the central GridPP DIRAC Service configuration.
- Create a volume group vac_volume_group which is big enough to hold one 40GB logical volume for each VM the factory machine will run at the same time.
- Identify a squid HTTP caching proxy to use with cvmfs. If you already have a proxy set up for cvmfs on worker nodes at your site then you can use that too. You may be able to run without a proxy, but failures during job execution will be more likely.
The details of the vac.conf options are given in the vac.conf(5) man page. You should specify the location of the cvmfs proxy to use in the [settings] section which applies to all machine types:
user_data_option_cvmfs_proxy = http://squid01.example.cc
The gds section should look like this, with a suitable replacement for the target_share:
[vacuum_pipe gds] target_share = 1.0 vacuum_pipe_url = https://repo.gridpp.ac.uk/vacproject/gds/all-vos.pipe
This causes Vac to fetch the specified vacuum pipe JSON file and then create a machinetype section in the Vac configuration for each VO supported by the GridPP DIRAC Service, defining how to create VMs for that VO. You can see the resulting expanded configuration in the /var/log/vacd-factory log file. All the VOs use the host certificate and key in the /var/lib/vac/machinetypes/gds/files/ directory, but appropriate values for the other Vac configuration options, including the FQANs to use for APEL accounting. The total target_share given in the vacuum_pipe section is shared out amongst the VOs in the vacuum pipe file, according to the target_share values given inside the vacuum pipe JSON file.
You can override values for individual machinetypes by creating a partial machinetype section in your configuration file which Vac will merge with the existing options for that machinetype. For example, to give one VO a higher eventual target_share (the value used by Vac in share calculations, not relative to the other machinetypes in the pipe now):
[machinetype gds-vm-pheno] target_share = 2.0
If your vacuum_pipe section is named gds as above, then the following machinetypes are available to you, including 4 and 8 processor VM variants: gds-vm-cernatschool-org, gds-vm-comet-j-parc-jp, gds-vm-dune, gds-vm-gridpp, gds-vm8-gridpp, gds-vm-hyperk-org, gds-vm-lsst, gds-vm4-lsst, gds-vm-lz, gds-vm-magrid, gds-vm-mice, gds-vm-na62-vo-gridpp-ac-uk, gds-vm-pheno, gds-vm-skatelescope-eu, gds-vm8-skatelescope-eu, gds-vm-snoplus-snolab-ca, gds-vm-solidexperiment-org, gds-vm-t2k-org, gds-vm-vo-moedal-org, gds-vm-vo-northgrid-ac-uk, gds-vm-vo-scotgrid-ac-uk
The configuration written to /var/log/vacd-factory can be used to discover the names of the machinetypes which are created when expanding the pipe values, and to check that your override(s) have taken effect.
If you wish to run 4 or 8 processor VMs, then you will need to have a line like
processors_per_superslot = 8
in your global [settings] section. This allows Vac to create multiprocessor VMs and groups of single processor (or 4 processor) VMs with the same finish time. This means that groups of 8 processors become available at the same time, making it possible to start multiprocessor VMs. If you do this, you should also enable the LHCb vacuum pipe, even at a very low target_share, as it contains elastic VMs which can backfill otherwise unused space: for example, if one single processor VM of a group of 8 finishes early.
If you replace the vacuum_pipe_url with https://repo.gridpp.ac.uk/vacproject/gds/all-vos-zero-shares.pipe then the machinetypes are all created with target_share zero. You can use this to selectively enable a subset of GridPP DIRAC Service VOs by creating machinetype sections for the VOs you want to run as above.
Vac re-reads its configuration files at every cycle (once a minute or so) and so the changes to vac.conf will take effect almost immediately. You should see Vac creating gds-vm-* VMs in /var/log/vacd-factory and the VMs themselves attempting to contact the DIRAC matcher to fetch work in the joboutputs subdirectories under /var/lib/vac/machines .