RAL Tier1 weekly operations castor 20/02/2012
From GridPP Wiki
- Nameserver upgraded to 2.1.11-8
- CMS upgraded to 2.1.11-8
- SRM problems following nameserver linked to a failure to update an alias pointing to old nameserver (castorvmgr.ads.rl.ac.uk).
- Upgraded VMGR caused heavy load. We were running it on both NS's, as before. Once one was turned off, the problem ceased.
- Ongoing crashing of SRMs, especially ATLAS. A better restarter has been put into place. Possible causes are:
- SL4 rpms (OS is SL5). We are configuring and testing the preprod SRM setup with upgraded rpms
- grid-mapfile distribution. A workaround is already in place
- some other memory problems
Planned, Scheduled and Cancelled Interventions
Entries in/planned to go to GOCDB
|Description||Start||End||Type||Affected VO(s)||Lead by|
|CASTOR 2.11-8 ATLAS Stager upgrade, inc. move to new hardware+SL5+Quattor||22/02/2012 08:00||22/02/2012 16:00||Downtime||ATLAS||Matthew|
|CASTOR 2.11-8 LHCb Stager upgrade, inc. move to new hardware+SL5+Quattor||27/02/2012 08:00||27/02/2012 16:00||Downtime||LHCb||Matthew|
|CASTOR 2.11-8 Gen Stager upgrade, inc. move to new hardware+SL5+Quattor||29/02/2012 08:00||29/02/2012 16:00||Downtime||Gen||Matthew|
- Move Tier1 instances to new Database infrastructure which with a Dataguard backup instance in R26
- Switch from LSF to Transfer Manager after 2.1.11 upgrade. Will need to better stress-test TM on preprod
- Start using Tape Gateway once CERN have been using it in production for approx. 2 months.
- Castor on Call person: Shaun
- Staff absence/out of the office:
- Shaun (Tues)