Difference between revisions of "RAL Tier1 Farm Shutdown"

From GridPP Wiki
Jump to: navigation, search
 
(No difference)

Latest revision as of 15:51, 29 November 2007

List of items that need doing during the farm shutdown (Monday 3rd December)

Main Items

3 x APC move/exchange between Compusys 2002 CPU and Clustervision 2003 disk servers

  1. Drain and shutdown affected workers and disk servers (one is BaBar).
  2. Remove APCs from racks and switch over (~1 hour).
  3. Update Configurations to reflect new connections (including Cacti).
  4. Wait for csfnfs02 before booting systems.

Kernel update to SQL, RB and LFC

dcache update and necessary dCache implementation on csfnfs46 to 64 and csfnfs42 including csfnfs58

Home filesystem(csfnfs02) fsck and quota, upgrade from SL3.3 to SL3.0.9 and update any outstanding RPMs

  1. Requires quiescent farm
  2. Save existing quotas (for safety) using commands:
    1. for f in `ypcat passwd | awk -F: '{print $1}' | sort`
    2. do; quota -v $f >> /root/quotas.3Dec07; done
    3. Upgrade systems software using commands:
    4. mkdir /a
    5. mount touch.gridpp.rl.ac.uk:/misc/kickstart/yum/SL/3.0.9/i386/SL/RPMS /a
    6. rpm –Uvh sl-release*; yum -y upgrade # Go for a cup of coffee
  3. Check home filesystem using commands:
    1. Edit /etc/fstab and /etc/exports to comment out references to /home/csf
    2. Reboot to single user and check that /home/csf is not mounted
    3. fsck –y /dev/sda2 # Go for another cup of coffee
    4. Sort out any problems not solved by fsck
  4. Requota home filesystem using commands:
    1. Edit /etc/fstab to re-enable /home/csf
    2. mount /home/csf
    3. /sbin/quotaon –p /home/csf # checks whether quotas are enabled
    4. /sbin/quotaoff /home/csf # If quotas are reported to be enabled
    5. /sbin/quotacheck –c /home/csf # –c does not check existing file
    6. Go for another cup of coffee and then the loo !
  5. Re-enable exports using commands:
    1. Edit /etc/exports to re-enable /home/csf
    2. Reboot to multi-user and check it can be accessed (e.g. from csfsysa)

csfnfs58 small filesystems available, upgrade to SL4.x with appropriate tested kernel

  1. Run the script /root/tune2fs.nofsck to prevent interval and count fsck on mounted filesystems
  2. Remove home filesystem from /etc/fstab
  3. Unmount all the external filesystems manually
  4. Reboot into run level 3 with external filesystems mounted
  5. Use fdisk to delete /dev/sdb3,4,5,6 through 15 and create a single filesystem called /dev/sdb3 of 100GB
  6. Use fdisk to split /dev/sdj into 5 x 100GB partitions - /dev/sdj1,j2,j3,j5,j6 (j4-extended)
  7. Use fdisk to split /dev/sdk into 2 x 200GB partitions and 1 x 100GB - /dev/sdk1,k2,k3
  8. Use fdisk to create a single 500GB partition on /dev/sdl called /dev/sdl1
  9. Use fdisk to check that /dev/sde2,e3,e4,e5 and e6 exist(e4-extended)
  10. Reboot the system to for the new partitions to be recognised with existing partitions mounted
  11. mke2fs -j -T largefile4 -N 5000000 -m 2 /dev/sdb3,sde2,e3,e5,e6,sdj1,j2,j3,j5,j6,sdk1,k2,k3,sdl1
  12. mkdir /exportstage/datafs-sdb3,sde2.... # create mountpoints
  13. Use chmod 775 /exportstage/data-sdb3,sde2.... for all filesystems
  14. Add the new filesystems to /etc/fstab leave home filesystem out
  15. Run the script /root/tune2fs.reset to enable count and interval checking
  16. Reboot - any external filesystems that need checking will be done
  17. The machine can be upgraded to SL4 at this time

The contents of /root, /etc and information about the filesystems has been saved in /exportstage/theory-herwig/csfnfs58 on /dev/sdb1

Areca machines kernel update

  1. Manually unmount the external filesystems and comment out these entries and the home filesystem in the /etc/fstab
  2. There are 2 files in /root on all the Areca machines
    1. kernel-2.6.9-55.0.12.aic1arc1.ELsmp.i686.rpm
    2. kernelinst # Run the kernelinst script,which installs the RPM and configures grub.. correctly
    3. Reboot the machines into single user mode and comment in the external filesystems but not the home filesystem in /etc/fstab
    4. Reboot to bring all external filesystems back on-line with fscks as required

OS and release update on Clustervision 2003 except for gdss43

OS and release update on Compusys 2004 except for gdss51

Kernel update on gdss43 and gdss51

  1. Manually unmount the external filesystems and comment out these entries and the home filesystem in the /etc/fstab
  2. There are 2 files in /root on both these machines
    1. kernel-2.6.9-55.0.12.aic1arc1.ELsmp.i686.rpm
    2. kernelinst # Run the kernelinst script, which installs the RPM and configures grub.. correctly
    3. Reboot the machines into single user mode and comment in the external filesystems but not the home filesystem in /etc/fstab
    4. Reboot to bring all external filesystems back on-line with fscks as required

Kernel update to FTS, BDII and DB00

xrootd update on Clustervision 2003 and Compusys 2004 - dependent on items 7 and 8 - can be defered

All outstanding kernel and glibc updates

All host name changes

Upgrade Manchester Babar servers

NB - Babar Manchester disk servers - Checked with Manny and Fergus, these systems will need upgrading.

Other opportunities

Network changes - Upgrade of Firmware in 6 Nortel switch stacks.

Oracle update - Confirmed with GB, any kernel updates can be done

LSF - None known at this particular time

Castor - None known at this particular time

Checked with both Bonny and Chris that they are not aware that they need LSF or Castor changes at this time. However, CMS want to do some testing at the start of December so we need to get our stuff in and done early before they start testing.