Difference between revisions of "Vac configuration for GridPP DIRAC"

From GridPP Wiki
Jump to: navigation, search
 
(10 intermediate revisions by 3 users not shown)
Line 1: Line 1:
This page explains how to run [[Quick_Guide_to_Dirac|GridPP DIRAC]] virtual machines on Vac factory machines. Please see the [http://www.gridpp.ac.uk/vac/ Vac website] for Vac's Admin Guide and man pages, which explain how to install and configure Vac itself and get a working Vac factory. '''These instructions are based on Vac 0.13.0 or above.'''
+
This page explains how to run [[Quick_Guide_to_Dirac|GridPP DIRAC]] virtual machines on Vac factory machines. Please see the [http://www.gridpp.ac.uk/vac/ Vac website] for Vac's Admin Guide and man pages, which explain how to install and configure Vac itself and get a working Vac factory. '''These instructions are based on Vac 00.21 or above.'''
  
 
==Requirements==
 
==Requirements==
Line 6: Line 6:
  
 
* When you configure Vac, you need to choose a Vac space name. This will be used as the Computing Element (CE) name in DIRAC.
 
* When you configure Vac, you need to choose a Vac space name. This will be used as the Computing Element (CE) name in DIRAC.
* One or more CE's are grouped together to form a site, which will take the form VAC.Example.cc where Example is derived from your institutional name and cc is the country code. eg VAC.CERN.ch or VAC.Manchester.uk. Site names are allocated and registered in the Dirac configuration service by the GridPP DIRAC service admins. If you have a site name in LHCb's DIRAC service then it is easiest to use the same naming convention in GridPP DIRAC. Most UK sites already have an LHCb name like LCG.Example.uk and would choose VAC.Example.uk for Vac.
+
* One or more CE's are grouped together to form a site, which will take the form VAC.Example.cc where Example is derived from your institutional name and cc is the country code. e.g. VAC.CERN-PROD.ch or VAC.UKI-NORTHGRID-MAN-HEP.uk. Site names are allocated and registered in the Dirac configuration service by the GridPP DIRAC service admins. Vac site names for UK sites are VAC.GOCDB_SITENAME.uk.  
 
* Obtain a host certificate which the VMs can use as a client certificate to fetch work from the central DIRAC task queue. One certificate can be used for all GridPP DIRAC VMs at a site. You should normally use a name which is specific to GridPP but is part of your site's DNS space. It doesn't need to correspond to a real host or really exist as an entry on your DNS servers: just that you are entitled to register it. So if your site's domain name is example.cc then a certificate for gridpp-vm.example.cc with a DN like /C=CC/O=XYZ/CN=gridpp-vm.example.cc would be a good choice.
 
* Obtain a host certificate which the VMs can use as a client certificate to fetch work from the central DIRAC task queue. One certificate can be used for all GridPP DIRAC VMs at a site. You should normally use a name which is specific to GridPP but is part of your site's DNS space. It doesn't need to correspond to a real host or really exist as an entry on your DNS servers: just that you are entitled to register it. So if your site's domain name is example.cc then a certificate for gridpp-vm.example.cc with a DN like /C=CC/O=XYZ/CN=gridpp-vm.example.cc would be a good choice.
* Contact one of the DIRAC service admins (ie janusz.martyniak AT imperial.ac.uk) to agree a site name and to register your CE, Site, and certificate DN in the central GridPP DIRAC configuration.
+
* Place the hostcert.pem and hostkey.pem of the certificate in the gridpp (or similar) subdirectory of /var/lib/vac/machinetypes
* Make a scratch logical volume of about 25GB available to the VM, with Vac's default device name hdb, as the root disk images aren't big enough for larger jobs.
+
* Contact one of the DIRAC service admins (ie lcg-site-admin AT imperial.ac.uk) to agree a site name and to register your CE, Site, and certificate DN in the central GridPP DIRAC configuration.
* Identify a squid HTTP caching proxy to use with cvmfs. If you already have a proxy set up for cvmfs on gLite/EMI worker nodes at your site then you can use that too. You may be able to run without a proxy, but failures during job execution will be more likely especially when we move to CernVM version 3.
+
* Create a volume group vac_volume_group which is big enough to hold one 40GB logical volume for each VM the factory machine will run at the same time.  
 
+
* Identify a squid HTTP caching proxy to use with cvmfs. If you already have a proxy set up for cvmfs on gLite/EMI worker nodes at your site then you can use that too. You may be able to run without a proxy, but failures during job execution will be more likely.
==Creating the user_data file==
+
 
+
The user_data file is passed to the VM by Vac and configures it for GridPP ("contextualization"). The procedure for creating user_data is very simple as SVN and cvmfs within the VM is used for most of the configuration. Just adapt the following steps to use your own site and host specific values:
+
cd /tmp
+
svn co http://svn.cern.ch/guest/vacproject/gridpp/vacprod/siteconfig
+
cd siteconfig
+
./make_user_data --hostcert ~/gridpp-vm.example.cc.cert.pem \
+
  --hostkey ~/gridpp-vm.example.cc.key.pem --site VAC.Example.cc \
+
  --gridce vac01.example.cc --cvmfsproxy http://squid-cache.example.cc:3128
+
mkdir -p /var/lib/vac/vmtypes/gridpp
+
cp /tmp/user_data.XXXXXXXX /var/lib/vac/vmtypes/gridpp/user_data
+
 
+
The exact value of the /tmp/user_data.XXXXXXXX path is output by the make_user_data each time you run it.
+
 
+
The options use the values set out in the Requirements section above. --cvmfsproxy can be omitted if you are not using a caching proxy for cvmfs, but the other options are required.
+
  
 
==Updating /etc/vac.conf==
 
==Updating /etc/vac.conf==
  
The details of the vac.conf options are given in the vac.conf(5) man page. However, the gridpp section should look like this:
+
The details of the vac.conf options are given in the vac.conf(5) man page. However, the gridpp section should look like this, with suitable replacements for the target_share and user_data_option__ and user_data_file__ values:
  [vmtype gridpp]
+
  [machinetype gridpp]
  root_image = /var/lib/vac/images/cernvm-batch-node-2.6.0-4-1-x86_64.hdd
+
  target_share = 1
 +
user_data_option_dirac_site = VAC.Example.cc
 +
user_data_option_vo = gridpp
 +
accounting_fqan = /gridpp/Role=NULL/Capability=NULL
 +
user_data_option_cvmfs_proxy = http://squid-cache.example.cc:3128
 +
user_data_file_hostcert = hostcert.pem
 +
user_data_file_hostkey = hostkey.pem
 +
user_data = https://repo.gridpp.ac.uk/vacproject/gridpp/user_data
 +
machine_model = cernvm3
 +
root_image = https://repo.gridpp.ac.uk/vacproject/gridpp/cernvm3.iso
 
  rootpublickey = /root/.ssh/id_rsa.pub
 
  rootpublickey = /root/.ssh/id_rsa.pub
user_data = user_data
 
 
  backoff_seconds = 3600  
 
  backoff_seconds = 3600  
 
  fizzle_seconds = 600
 
  fizzle_seconds = 600
 +
heartbeat_file = heartbeat
 +
heartbeat_seconds = 600
 
  max_wallclock_seconds = 100000
 
  max_wallclock_seconds = 100000
  log_machineoutputs = True
+
   
accounting_fqan=/gridpp/Role=NULL/Capability=NULL
+
  
If you are using a different version of the CernVM 2 image, you will need to change the exact file name in the root_image option.
+
The vo option should be gridpp to get jobs from the default pool of jobs submitted by members of gridpp_user. This can be replaced by the VO name of any other VOs supported by the GridPP DIRAC service (e.g. vo.northgrid.ac.uk) You should insert the appropriate VOMS FQAN in the accounting_fqan option for machinetypes other than the main gridpp VO.
  
 
Vac will destroy the VM if it runs for more than max_wallclock_seconds and you may want to experiment with shorter values. Most modern machines should be able to run jobs comfortably within 24 hours (86400 seconds.)
 
Vac will destroy the VM if it runs for more than max_wallclock_seconds and you may want to experiment with shorter values. Most modern machines should be able to run jobs comfortably within 24 hours (86400 seconds.)
  
If no work is available from the central DIRAC task queue and a VM stops with 'Nothing to do', backoff_seconds determines how long Vac waits before trying to run a GridPP VM again. This waiting is co-ordinated between all factory machines in a space using Vac's UDP protocol. For testing, you may want to set this to 0, but please do not leave it at that to avoid unnecessarily loading the central service.
+
If no work is available from the central DIRAC task queue and a VM stops with 'Nothing to do', backoff_seconds determines how long Vac waits before trying to run a GridPP VM again. This waiting is co-ordinated between all factory machines in a space using Vac's VacQuery UDP protocol. For testing, you may want to set this to 0, but please do not leave it at that to avoid unnecessarily loading the central service.
  
 
You can omit the rootpublickey option, but it is extremely useful for debugging. See the Vac Admin Guide for more about how to set it up.
 
You can omit the rootpublickey option, but it is extremely useful for debugging. See the Vac Admin Guide for more about how to set it up.
  
With log_machineoutputs set to True, the stdout of the jobs will be appended to /var/log/vacd-machineoutputs once the VM has finished. Again, this is very useful for debugging and something the ops team may ask you for it if you run into problems.
+
Vac re-reads its configuration files at every cycle (once a minute or so) and so the changes to vac.conf will take effect almost immediately. You should see Vac creating gridpp VMs in /var/log/vacd-factory and the VMs themselves attempting to contact the DIRAC matcher to fetch work in the joboutputs subdirectories under /var/lib/vac/machines .
 
+
For GridPP DIRAC there is no need to make a copy of the vac-shutdown-vm script.
+
 
+
Vac re-reads its configuration files at every cycle (once a minute or so) and so the changes to vac.conf will take effect almost immediately. You should see Vac creating gridpp VMs in /var/log/vacd-factory and the VMs themselves attempting to contact the DIRAC matcher to fetch work in /var/log/vacd-machineoutputs .
+

Revision as of 14:56, 12 July 2017

This page explains how to run GridPP DIRAC virtual machines on Vac factory machines. Please see the Vac website for Vac's Admin Guide and man pages, which explain how to install and configure Vac itself and get a working Vac factory. These instructions are based on Vac 00.21 or above.

Requirements

Before configuring Vac for GridPP DIRAC, you need to follow these steps:

  • When you configure Vac, you need to choose a Vac space name. This will be used as the Computing Element (CE) name in DIRAC.
  • One or more CE's are grouped together to form a site, which will take the form VAC.Example.cc where Example is derived from your institutional name and cc is the country code. e.g. VAC.CERN-PROD.ch or VAC.UKI-NORTHGRID-MAN-HEP.uk. Site names are allocated and registered in the Dirac configuration service by the GridPP DIRAC service admins. Vac site names for UK sites are VAC.GOCDB_SITENAME.uk.
  • Obtain a host certificate which the VMs can use as a client certificate to fetch work from the central DIRAC task queue. One certificate can be used for all GridPP DIRAC VMs at a site. You should normally use a name which is specific to GridPP but is part of your site's DNS space. It doesn't need to correspond to a real host or really exist as an entry on your DNS servers: just that you are entitled to register it. So if your site's domain name is example.cc then a certificate for gridpp-vm.example.cc with a DN like /C=CC/O=XYZ/CN=gridpp-vm.example.cc would be a good choice.
  • Place the hostcert.pem and hostkey.pem of the certificate in the gridpp (or similar) subdirectory of /var/lib/vac/machinetypes
  • Contact one of the DIRAC service admins (ie lcg-site-admin AT imperial.ac.uk) to agree a site name and to register your CE, Site, and certificate DN in the central GridPP DIRAC configuration.
  • Create a volume group vac_volume_group which is big enough to hold one 40GB logical volume for each VM the factory machine will run at the same time.
  • Identify a squid HTTP caching proxy to use with cvmfs. If you already have a proxy set up for cvmfs on gLite/EMI worker nodes at your site then you can use that too. You may be able to run without a proxy, but failures during job execution will be more likely.

Updating /etc/vac.conf

The details of the vac.conf options are given in the vac.conf(5) man page. However, the gridpp section should look like this, with suitable replacements for the target_share and user_data_option__ and user_data_file__ values:

[machinetype gridpp]
target_share = 1
user_data_option_dirac_site = VAC.Example.cc
user_data_option_vo = gridpp
accounting_fqan = /gridpp/Role=NULL/Capability=NULL
user_data_option_cvmfs_proxy = http://squid-cache.example.cc:3128
user_data_file_hostcert = hostcert.pem
user_data_file_hostkey = hostkey.pem 
user_data = https://repo.gridpp.ac.uk/vacproject/gridpp/user_data
machine_model = cernvm3
root_image = https://repo.gridpp.ac.uk/vacproject/gridpp/cernvm3.iso
rootpublickey = /root/.ssh/id_rsa.pub
backoff_seconds = 3600 
fizzle_seconds = 600
heartbeat_file = heartbeat
heartbeat_seconds = 600
max_wallclock_seconds = 100000

The vo option should be gridpp to get jobs from the default pool of jobs submitted by members of gridpp_user. This can be replaced by the VO name of any other VOs supported by the GridPP DIRAC service (e.g. vo.northgrid.ac.uk) You should insert the appropriate VOMS FQAN in the accounting_fqan option for machinetypes other than the main gridpp VO.

Vac will destroy the VM if it runs for more than max_wallclock_seconds and you may want to experiment with shorter values. Most modern machines should be able to run jobs comfortably within 24 hours (86400 seconds.)

If no work is available from the central DIRAC task queue and a VM stops with 'Nothing to do', backoff_seconds determines how long Vac waits before trying to run a GridPP VM again. This waiting is co-ordinated between all factory machines in a space using Vac's VacQuery UDP protocol. For testing, you may want to set this to 0, but please do not leave it at that to avoid unnecessarily loading the central service.

You can omit the rootpublickey option, but it is extremely useful for debugging. See the Vac Admin Guide for more about how to set it up.

Vac re-reads its configuration files at every cycle (once a minute or so) and so the changes to vac.conf will take effect almost immediately. You should see Vac creating gridpp VMs in /var/log/vacd-factory and the VMs themselves attempting to contact the DIRAC matcher to fetch work in the joboutputs subdirectories under /var/lib/vac/machines .