Difference between revisions of "RAL Tier1 Batch Worker Procedures"
From GridPP Wiki
Derek ross (Talk | contribs) |
(No difference)
|
Latest revision as of 15:29, 26 September 2008
Procedures for the grid batch workers on the RAL Tier1 Farm
Contents
Updating/Upgrading procedures
Updating gLite software on batch worker
- Download tarball releases to /stage/sl3-lcg-exp/**NEW_RELEASE**/WN on csfnfs58 (where **NEW_RELEASE** is an identifier for the release)
- untar them
- Under /stage/sl3-lcg-exp/**NEW_RELEASE**/WN/glite/yaim create an etc directory
- Copy the site-info.def and users.conf and groups.conf (users and groups are supposed to be empty) from previous release into this new directory
- Merge site-info.def from previous release with site-info.def from tier1-yaim-config rpm
- This has to be done by hand - a reasonable heuristic is to use system configuration information from the previous release and grid configuration information (VO_* entries for example) from the tier-yaim-config rpm
- Update all paths that point to previous release to new release path - This should only be the INSTALL_ROOT variable
- recursively chown /stage/sl3-lcg-exp/**NEW_RELEASE**/WN to your uid/gid
- Get a batch worker
- move .glite directory in home directory out of the way
- log into batch worker as yourself (or root and su - to yourself)
- cd /stage/sl3-lcg-exp/**NEW_RELEASE**/WN/glite/yaim/bin
- run ./yaim -s ../etc/site-info.def -n TAR_WN -c
- There should be some errors about not able to add cron jobs and files in /etc/glite/profile.d/ these are okay, any others should be investigated
Local Hacks
Fixing Perl library path
# pwd /stage/sl3-lcg-exp/GLITE-3_0_9/WN/etc/env.d [root@csfnfs58 env.d]# diff gliteenv.sh.orig gliteenv.sh 25a26,29 > PERLLIB="${PERLLIB}:${GLITE_LOCATION}/lib/perl5/5.8.0/i386-linux-thread-multi:${GLITE_LOCATION}/lib/perl5/5.8.0/:${GLITE_LOCATION}/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi:${GLITE_LOCATION}/lib/perl5/site_perl/5.8.0:${GLITE_LOCATION}/lib/perl5/site_perl:${GLITE_LOCATION}/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi:${GLITE_LOCATION}/lib/perl5/vendor_perl/5.8.0:${GLITE_LOCATION}/lib/perl5/vendor_perl" > > > ]# diff gliteenv.csh /stage/sl3-lcg-exp/SL4-GLITE-3_0_15-6/WN/etc/env.d/gliteenv.csh.orig 38,41d37 < setenv PERLLIB="${PERLLIB}:${GLITE_LOCATION}/lib/perl5/5.8.0/i386-linux-thread-multi:${GLITE_LOCATION}/lib/perl5/5.8.0/:${GLITE_LOCATION}/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi:${GLITE_LOCATION}/lib/perl5/site_perl/5.8.0:${GLITE_LOCATION}/lib/perl5/site_perl:${GLITE_LOCATION}/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi:${GLITE_LOCATION}/lib/perl5/vendor_perl/5.8.0:${GLITE_LOCATION}/lib/perl5/vendor_perl" < < <
Disabling Job wrapper SAM test
cd ${INSTALL_ROOT}/WN/lcg/etc/ mkdir disabled mv jobwrapper-*/* disabled/
Deploying over farm
Update the variable GLITEDIR in the file /stage/sl3-lcg-exp/etc/profile.d/glite-version to be the name of the directory of the WN install directory (**NEW_RELEASE** in the example above), e.g. GLITE-3_1_15-0
Updating certificates on batch workers
Update the lcg-CA rpm (and dependencies) on csfnfs58. At 4am a daily cronjob (zz-distribute-crl) will copy the new certificates to the correct location to be picked up by the batch workers
Making new worker nodes known to the CE
- Ensure node is /etc/ssh/shosts.equiv on CE
- If the host is already in the batch system:
- Run /opt/edg/sbin/edg-pbs-knownhosts on the CE
- If the host is not already in the batch system
- Add the host to the NODES line of /opt/edg/sbin/edg-pbs-knownhosts on the CE
- Run /opt/edg/sbin/edg-pbs-knownhosts on the CE
- remove the host from the NODES line of /opt/edg/sbin/edg-pbs-knownhosts on the CE
- Tell the Fabric team the node can be added to the batch system