Difference between revisions of "RAL Tier1 Farm Shutdown"
From GridPP Wiki
James thorne (Talk | contribs) |
(No difference)
|
Latest revision as of 15:51, 29 November 2007
List of items that need doing during the farm shutdown (Monday 3rd December)
Main Items
3 x APC move/exchange between Compusys 2002 CPU and Clustervision 2003 disk servers
- Drain and shutdown affected workers and disk servers (one is BaBar).
- Remove APCs from racks and switch over (~1 hour).
- Update Configurations to reflect new connections (including Cacti).
- Wait for csfnfs02 before booting systems.
Kernel update to SQL, RB and LFC
dcache update and necessary dCache implementation on csfnfs46 to 64 and csfnfs42 including csfnfs58
Home filesystem(csfnfs02) fsck and quota, upgrade from SL3.3 to SL3.0.9 and update any outstanding RPMs
- Requires quiescent farm
- Save existing quotas (for safety) using commands:
- for f in `ypcat passwd | awk -F: '{print $1}' | sort`
- do; quota -v $f >> /root/quotas.3Dec07; done
- Upgrade systems software using commands:
- mkdir /a
- mount touch.gridpp.rl.ac.uk:/misc/kickstart/yum/SL/3.0.9/i386/SL/RPMS /a
- rpm –Uvh sl-release*; yum -y upgrade # Go for a cup of coffee
- Check home filesystem using commands:
- Edit /etc/fstab and /etc/exports to comment out references to /home/csf
- Reboot to single user and check that /home/csf is not mounted
- fsck –y /dev/sda2 # Go for another cup of coffee
- Sort out any problems not solved by fsck
- Requota home filesystem using commands:
- Edit /etc/fstab to re-enable /home/csf
- mount /home/csf
- /sbin/quotaon –p /home/csf # checks whether quotas are enabled
- /sbin/quotaoff /home/csf # If quotas are reported to be enabled
- /sbin/quotacheck –c /home/csf # –c does not check existing file
- Go for another cup of coffee and then the loo !
- Re-enable exports using commands:
- Edit /etc/exports to re-enable /home/csf
- Reboot to multi-user and check it can be accessed (e.g. from csfsysa)
csfnfs58 small filesystems available, upgrade to SL4.x with appropriate tested kernel
- Run the script /root/tune2fs.nofsck to prevent interval and count fsck on mounted filesystems
- Remove home filesystem from /etc/fstab
- Unmount all the external filesystems manually
- Reboot into run level 3 with external filesystems mounted
- Use fdisk to delete /dev/sdb3,4,5,6 through 15 and create a single filesystem called /dev/sdb3 of 100GB
- Use fdisk to split /dev/sdj into 5 x 100GB partitions - /dev/sdj1,j2,j3,j5,j6 (j4-extended)
- Use fdisk to split /dev/sdk into 2 x 200GB partitions and 1 x 100GB - /dev/sdk1,k2,k3
- Use fdisk to create a single 500GB partition on /dev/sdl called /dev/sdl1
- Use fdisk to check that /dev/sde2,e3,e4,e5 and e6 exist(e4-extended)
- Reboot the system to for the new partitions to be recognised with existing partitions mounted
- mke2fs -j -T largefile4 -N 5000000 -m 2 /dev/sdb3,sde2,e3,e5,e6,sdj1,j2,j3,j5,j6,sdk1,k2,k3,sdl1
- mkdir /exportstage/datafs-sdb3,sde2.... # create mountpoints
- Use chmod 775 /exportstage/data-sdb3,sde2.... for all filesystems
- Add the new filesystems to /etc/fstab leave home filesystem out
- Run the script /root/tune2fs.reset to enable count and interval checking
- Reboot - any external filesystems that need checking will be done
- The machine can be upgraded to SL4 at this time
The contents of /root, /etc and information about the filesystems has been saved in /exportstage/theory-herwig/csfnfs58 on /dev/sdb1
Areca machines kernel update
- Manually unmount the external filesystems and comment out these entries and the home filesystem in the /etc/fstab
- There are 2 files in /root on all the Areca machines
- kernel-2.6.9-55.0.12.aic1arc1.ELsmp.i686.rpm
- kernelinst # Run the kernelinst script,which installs the RPM and configures grub.. correctly
- Reboot the machines into single user mode and comment in the external filesystems but not the home filesystem in /etc/fstab
- Reboot to bring all external filesystems back on-line with fscks as required
OS and release update on Clustervision 2003 except for gdss43
OS and release update on Compusys 2004 except for gdss51
Kernel update on gdss43 and gdss51
- Manually unmount the external filesystems and comment out these entries and the home filesystem in the /etc/fstab
- There are 2 files in /root on both these machines
- kernel-2.6.9-55.0.12.aic1arc1.ELsmp.i686.rpm
- kernelinst # Run the kernelinst script, which installs the RPM and configures grub.. correctly
- Reboot the machines into single user mode and comment in the external filesystems but not the home filesystem in /etc/fstab
- Reboot to bring all external filesystems back on-line with fscks as required
Kernel update to FTS, BDII and DB00
xrootd update on Clustervision 2003 and Compusys 2004 - dependent on items 7 and 8 - can be defered
All outstanding kernel and glibc updates
All host name changes
Upgrade Manchester Babar servers
NB - Babar Manchester disk servers - Checked with Manny and Fergus, these systems will need upgrading.
Other opportunities
Network changes - Upgrade of Firmware in 6 Nortel switch stacks.
Oracle update - Confirmed with GB, any kernel updates can be done
LSF - None known at this particular time
Castor - None known at this particular time
Checked with both Bonny and Chris that they are not aware that they need LSF or Castor changes at this time. However, CMS want to do some testing at the start of December so we need to get our stuff in and done early before they start testing.