Imperial Dirac Maintenance

From GridPP Wiki
Jump to: navigation, search

Hardware

The Imperial dirac server consists of 4 machines:
dirac01 (configuration server, main dirac server)
dirac02 (second dirac server, for load balancing etc)
diracdb (hosts the databases)
diracweb (hosts the webserver)

Restarting dirac

To restart dirac:
as 'dirac' in /opt/dirac/:

killall -SIGHUP runsvdir
source bashrc
runsvdir /opt/dirac/startup &

Updating the dirac install

Note: dirac01 and dirac02 should always run the same version of the dirac software. Whatever you do it helps having a clean shell without the cursed bashrc having been invoked.

cd /opt/dirac/DIRAC
git status
git pull
find . -iname '*.pyo' -delete
[restart dirac]

and the GridPP module

cd /opt/dirac/GridPPDIRAC
git status
git pull
find . -iname '*.pyo' -delete
[restart dirac]

Check the logs

Looking at a specific SiteDirector:
tail -f startup/WorkloadManagement_SiteDirectorGridPP/log/current
Local dirac code:
tail -f /opt/dirac/startup/Configuration_AutoBdii2CSAgent/log/current


Best of Github

The ic-hep repo

Merge requests: Being logged into github really helps :-)

to check what changed if 'git status' complains about modified files:
git diff [file it's upset about]

Adding a new VO

(example taken from the LZ VO, from memory, so probably incomplete)
on dirac01:
add VO to /etc/vomses and /etc/grid-security/vomsdir (as root)

on the webinterface (as dirac_admin):
Registry -> Groups: create lz_pilot and lz_user with all options
Registry -> VO: add lz with all options and subfolder VOMSServers
Registry -> VOMS -> Mapping: add lz_user and lz_pilot
Registry -> VOMS -> URLs: add lz folder with all options
Operations: add lz folder with all options
Systems -> WorkLoadManagement -> Production -> Agents: add folder SiteDirectorLz with all options

as dirac user:
dirac-install-agent WorkloadManagement SiteDirectorLz -m SiteDirector
dirac-proxy-init -C /etc/grid-security/pilotcert.pem -K /etc/grid-security/pilotkey.pem -g lz_pilot -M -P

Registering a file in the dirac file catalogue

For those situations when a file is on an SE but not in the dirac file catalogue....

  • File copied to SE without dirac involvement:

lcg-cp -vvv -b -D srmv2 somefile srm://gfe02.grid.hep.ph.ic.ac.uk:8443/srm/managerv2?SFN=/pnfs/hep.ph.ic.ac.uk/data/comet/comet.j-parc.jp/user/daniela.bauer/some.test.file
[...]

  1. streams: 1
      941720 bytes   3628.98 KB/sec avg   3628.98 KB/sec inst 
  • Register the file by hand

[on dirac ui]
dirac-dms-filecatalog-cli
FC:/> register file /comet.j-parc.jp/user/daniela.bauer/some.test.file srm://gfe02.grid.hep.ph.ic.ac.uk:8443/srm/managerv2?SFN=/pnfs/hep.ph.ic.ac.uk/data/comet.j-parc.jp/user/daniela.bauer/some.test.file 941720 UKI-LT2-IC-HEP-disk
File successfully added to the catalog
exit

  • To check if it worked, on the dirac ui:

dirac-dms-get-file /comet.j-parc.jp/user/daniela.bauer/some.test.file

Switching glexec on/off

  • Resources -> Computing -> glexec: RescheduleOnError = True (this should also enable glexec in logging only mode)
  • Systems -> WorkLoadManagement -> Production ->Agent -> JobAgent -> CEtype = glexec or InProcess (to turn glexec off)

Ban a CE, rather than a site

  • Systems ->Configuration -> Agents -> AutoBdii2CSAgent -> BannedCEs
  • Restart the AutoBdii2CSAgent