RAL Tier1 weekly operations castor 14/05/2012
From GridPP Wiki
Revision as of 13:45, 14 May 2012 by Matt viljoen (Talk | contribs)
Contents
Operations News
- Successful switch from rtcpclientd to tapegateway for Gen
Operations Problems
- (Fri) Expired CRLs on the SRMs (including some disk servers) affected ATLAS, LHCb and CMS from 1800 till 0100 Sat morning. FP#246
Blocking Issues
- Need to relocate Repack stager to a database running 11g prior to being about to upgrade it to 2.1.12.
Planned, Scheduled and Cancelled Interventions
Entries in/planned to go to GOCDB
Description | Start | End | Type | Affected VO(s) | Lead by |
---|---|---|---|---|---|
Switch from rtcpclientd to tapegatewayd | 15/05/12 10:00 | 15/05/12 11:00 | Downtime | LHCb | Matthew |
Switch from rtcpclientd to tapegatewayd | 16/05/12 10:00 | 16/05/12 11:00 | Downtime | ATLAS+CMS | Matthew |
Switch from LSF to Transfer Manager | 28/05/12 10:00 | 28/05/12 11:00 | Downtime | CMS | Matthew |
Switch from LSF to Transfer Manager | 29/05/12 10:00 | 29/05/12 11:00 | Downtime | Gen | Matthew |
Switch from LSF to Transfer Manager | 30/05/12 10:00 | 30/05/12 11:00 | Downtime | LHCb | Matthew |
Switch from LSF to Transfer Manager | 07/06/12 10:00 | 07/06/12 11:00 | Downtime | ATLAS | Matthew |
CIP 2.2.0 upgrade (STC) | TBD | TBD | At-risk | All | Matthew |
Advanced Planning
Tasks
- Test and re-apply CIP upgrade (Jens, Matthew)
- Test and certify 2.1.12-4 and 2.1.11-9 (Matthew, Chris)
- Stress testing of Transfer Manager (TM) (Shaun, All) DONE
- Ganglia monitoring for TM (Rob, Chris) DONE
- Re-instantiate certification on HyperV VMs using Quattor+Puppet (Rob)
- Stress testing of CV11 generation disk servers on preprod (Rob, Matthew) DONE
- Selection of disk-only prototype solution (Shaun, Rob, Brian, James)
- Switch to Tape Gateway on repack and test (Tim, Matthew) DONE
Interventions
- Upgrade repack to 2.1.12-4 (May)
- Switch from LSF to TM after 2.1.11-8 upgrade. Will need to better stress-test TM on preprod with more disk servers. (Apr)
- Switch to Tape Gateway (TG) once it has been tested on repack (May)
- Upgrade Castor Facilities and Tier1 instances to 2.1.11-9 (Jun)
- Upgrade Oracle to 11g (Jun)
- Upgrade to 2.1.12 on Tier1 instances once we are happy with TM and TG in performance (Jul)
Staffing
- Castor on Call person: Matthew
- Staff absence/out of the office:
- Bank Holiday Monday