RAL Tier1 Batch Worker Procedures

From GridPP Wiki
Revision as of 15:29, 26 September 2008 by Derek ross (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Procedures for the grid batch workers on the RAL Tier1 Farm

Updating/Upgrading procedures

Updating gLite software on batch worker

  • Download tarball releases to /stage/sl3-lcg-exp/**NEW_RELEASE**/WN on csfnfs58 (where **NEW_RELEASE** is an identifier for the release)
  • untar them
  • Under /stage/sl3-lcg-exp/**NEW_RELEASE**/WN/glite/yaim create an etc directory
  • Copy the site-info.def and users.conf and groups.conf (users and groups are supposed to be empty) from previous release into this new directory
  • Merge site-info.def from previous release with site-info.def from tier1-yaim-config rpm
    • This has to be done by hand - a reasonable heuristic is to use system configuration information from the previous release and grid configuration information (VO_* entries for example) from the tier-yaim-config rpm
    • Update all paths that point to previous release to new release path - This should only be the INSTALL_ROOT variable
  • recursively chown /stage/sl3-lcg-exp/**NEW_RELEASE**/WN to your uid/gid
  • Get a batch worker
  • move .glite directory in home directory out of the way
  • log into batch worker as yourself (or root and su - to yourself)
  • cd /stage/sl3-lcg-exp/**NEW_RELEASE**/WN/glite/yaim/bin
  • run ./yaim -s ../etc/site-info.def -n TAR_WN -c
    • There should be some errors about not able to add cron jobs and files in /etc/glite/profile.d/ these are okay, any others should be investigated

Local Hacks

Fixing Perl library path

# pwd
/stage/sl3-lcg-exp/GLITE-3_0_9/WN/etc/env.d
[root@csfnfs58 env.d]# diff gliteenv.sh.orig gliteenv.sh
25a26,29
>       PERLLIB="${PERLLIB}:${GLITE_LOCATION}/lib/perl5/5.8.0/i386-linux-thread-multi:${GLITE_LOCATION}/lib/perl5/5.8.0/:${GLITE_LOCATION}/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi:${GLITE_LOCATION}/lib/perl5/site_perl/5.8.0:${GLITE_LOCATION}/lib/perl5/site_perl:${GLITE_LOCATION}/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi:${GLITE_LOCATION}/lib/perl5/vendor_perl/5.8.0:${GLITE_LOCATION}/lib/perl5/vendor_perl"
>
>
>

]# diff gliteenv.csh /stage/sl3-lcg-exp/SL4-GLITE-3_0_15-6/WN/etc/env.d/gliteenv.csh.orig
38,41d37
<         setenv PERLLIB="${PERLLIB}:${GLITE_LOCATION}/lib/perl5/5.8.0/i386-linux-thread-multi:${GLITE_LOCATION}/lib/perl5/5.8.0/:${GLITE_LOCATION}/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi:${GLITE_LOCATION}/lib/perl5/site_perl/5.8.0:${GLITE_LOCATION}/lib/perl5/site_perl:${GLITE_LOCATION}/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi:${GLITE_LOCATION}/lib/perl5/vendor_perl/5.8.0:${GLITE_LOCATION}/lib/perl5/vendor_perl"
<
<
<

Disabling Job wrapper SAM test

cd ${INSTALL_ROOT}/WN/lcg/etc/
mkdir disabled
mv jobwrapper-*/* disabled/

Deploying over farm

Update the variable GLITEDIR in the file /stage/sl3-lcg-exp/etc/profile.d/glite-version to be the name of the directory of the WN install directory (**NEW_RELEASE** in the example above), e.g. GLITE-3_1_15-0

Updating certificates on batch workers

Update the lcg-CA rpm (and dependencies) on csfnfs58. At 4am a daily cronjob (zz-distribute-crl) will copy the new certificates to the correct location to be picked up by the batch workers

Making new worker nodes known to the CE

  • Ensure node is /etc/ssh/shosts.equiv on CE
  • If the host is already in the batch system:
    • Run /opt/edg/sbin/edg-pbs-knownhosts on the CE


  • If the host is not already in the batch system
    • Add the host to the NODES line of /opt/edg/sbin/edg-pbs-knownhosts on the CE
    • Run /opt/edg/sbin/edg-pbs-knownhosts on the CE
    • remove the host from the NODES line of /opt/edg/sbin/edg-pbs-knownhosts on the CE
    • Tell the Fabric team the node can be added to the batch system