Difference between revisions of "Imperial Dirac Maintenance"

From GridPP Wiki
Jump to: navigation, search
(Adding a new VO)
(Adding a new VO)
Line 76: Line 76:
  
 
Restart the UsersAndGroups Agent to populate the new groups.<br/>
 
Restart the UsersAndGroups Agent to populate the new groups.<br/>
Restart the BdiitoCSAgent to pick up any newly commissioned SEs etc. <br/>  
+
Once the VO configuration is complete restart the BDII2CS agent. <br/>  
  
 
Upload a pilot proxy (on dirac01 as 'dirac'): <br>
 
Upload a pilot proxy (on dirac01 as 'dirac'): <br>

Revision as of 13:34, 30 June 2016

Hardware

The Imperial dirac server consists of 4 machines:
dirac01 (configuration server, main dirac server)
dirac02 (second dirac server, for load balancing etc)
diracdb (hosts the databases)
diracweb (hosts the webserver)

Restarting dirac

To restart dirac:
as 'dirac' in /opt/dirac/:

killall -SIGHUP runsvdir
source bashrc
runsvdir /opt/dirac/startup &

Updating the dirac install

Note: dirac01 and dirac02 should always run the same version of the dirac software. Whatever you do it helps having a clean shell without the cursed bashrc having been invoked.

cd /opt/dirac/DIRAC
git status
git pull
find . -iname '*.pyo' -delete
[restart dirac]

and the GridPP module

cd /opt/dirac/GridPPDIRAC
git status
git pull
find . -iname '*.pyo' -delete
[restart dirac]

Note that for new dirac commands ('dirac-populate-component-db'), only the relevant python files are downloaded. To make the actual commands run:

dirac-deploy-scripts


Check the logs

Looking at a specific SiteDirector:
tail -f startup/WorkloadManagement_SiteDirectorGridPP/log/current
Local dirac code:
tail -f /opt/dirac/startup/Configuration_AutoBdii2CSAgent/log/current


Best of Github

The ic-hep repo

Merge requests: Being logged into github really helps :-)

to check what changed if 'git status' complains about modified files:
git diff [file it's upset about]

Adding a new VO

(example taken from the LZ VO, from memory, so probably incomplete)
on dirac01:
add VO to /etc/vomses and /etc/grid-security/vomsdir (as root)

on the webinterface (as dirac_admin):
Registry -> Groups: create lz_pilot and lz_user with all options
Registry -> VO: add lz with all options and subfolder VOMSServers
Registry -> VOMS -> Mapping: add lz_user and lz_pilot
Registry -> VOMS -> URLs: add lz folder with all options
Operations: add lz folder with all options
Systems -> WorkLoadManagement -> Production -> Agents: add folder SiteDirectorLz with all options

Make a SiteDirector (from a dirac ui):

source bashrc
dirac-proxy-init -g dirac_admin
dirac-admin-sysadmin-cli --host dirac01.grid.hep.ph.ic.ac.uk
[dirac01.grid.hep.ph.ic.ac.uk]> install agent WorkloadManagement SiteDirectorLsst -m SiteDirector
agent WorkloadManagement_SiteDirectorLsst is installed, runit status: Run 

Restart the UsersAndGroups Agent to populate the new groups.
Once the VO configuration is complete restart the BDII2CS agent.

Upload a pilot proxy (on dirac01 as 'dirac'):
dirac-proxy-init -C /etc/grid-security/pilotcert.pem -K /etc/grid-security/pilotkey.pem -g lz_pilot -M -P

Make a top level directory on the file catalogue (from a dirac UI):

source bashrc 
dirac-proxy-init -g dirac_admin 
dirac-dms-filecatalog-cli 
mkdir [voname] 
chgrp voname_user [VO]
ls -l

Registering a file in the dirac file catalogue

For those situations when a file is on an SE but not in the dirac file catalogue....

  • File copied to SE without dirac involvement:

lcg-cp -vvv -b -D srmv2 somefile srm://gfe02.grid.hep.ph.ic.ac.uk:8443/srm/managerv2?SFN=/pnfs/hep.ph.ic.ac.uk/data/comet/comet.j-parc.jp/user/daniela.bauer/some.test.file
[...]

  1. streams: 1
      941720 bytes   3628.98 KB/sec avg   3628.98 KB/sec inst 
  • Register the file by hand

[on dirac ui]
dirac-dms-filecatalog-cli
FC:/> register file /comet.j-parc.jp/user/daniela.bauer/some.test.file srm://gfe02.grid.hep.ph.ic.ac.uk:8443/srm/managerv2?SFN=/pnfs/hep.ph.ic.ac.uk/data/comet.j-parc.jp/user/daniela.bauer/some.test.file 941720 UKI-LT2-IC-HEP-disk
File successfully added to the catalog
exit

  • To check if it worked, on the dirac ui:

dirac-dms-get-file /comet.j-parc.jp/user/daniela.bauer/some.test.file

Switching glexec on/off

Official documentation.

  • Resources -> Computing -> glexec: RescheduleOnError = True (this should also enable glexec in logging only mode)
  • Systems -> WorkLoadManagement -> Production ->Agent -> JobAgent -> CEtype = glexec or InProcess (to turn glexec off)

Enable/Ban a site

On a dirac ui:
source bashrc
dirac-proxy-init -g dirac_admin
dirac-admin-allow-site LCG.RAL-LCG2.uk "Test"
dirac-admin-ban-site LCG.RAL-LCG2.uk "Test"

Ban a CE, rather than a site

  • Systems ->Configuration -> Agents -> AutoBdii2CSAgent -> BannedCEs
  • Restart the AutoBdii2CSAgent


Install a new database/service etc

On a dirac UI of your choice...

source bashrc
dirac-proxy-init -g dirac_admin
dirac-admin-sysadmin-cli --host dirac01.grid.hep.ph.ic.ac.uk
[dirac01.grid.hep.ph.ic.ac.uk]> install db InstalledComponentsDB
MySQL root password: [go find password in config file]
Adding to CS Framework/InstalledComponentsDB
Database InstalledComponentsDB from DIRAC/FrameworkSystem installed successfully 

Edit the automatic entry under 'Host' in the config file (Systems -> Framework -> Production -> Databases) to contain the actual database machine.
Now install the service:

[dirac01.grid.hep.ph.ic.ac.uk]> install service Framework ComponentMonitoring

Dirac: Back to the dirac overview page.