Difference between revisions of "RAL Tier1 weekly operations castor 07/05/2012"
From GridPP Wiki
Matt viljoen (Talk | contribs) |
(No difference)
|
Latest revision as of 09:07, 11 May 2012
Contents
Operations News
- (Wed) Increased xrootd job slots limit after large number of failed jobs on LHCb
- Preprod upgraded to 2.1.11-9. Another smooth test upgrade.
Operations Problems
- Attempting to upgrade repack to 2.1.12-4 has been delayed as this version requires ORACLE 11g - which is not possible yet as it uses the production RAC. We have been forced to rolled back repack to 2.1.11-9.
Blocking Issues
- none
Planned, Scheduled and Cancelled Interventions
Entries in/planned to go to GOCDB
Description | Start | End | Type | Affected VO(s) | Lead by |
---|---|---|---|---|---|
Switch from rtcpclientd to tapegatewayd | 10/05/12 09:00 | 10/05/12 10:00 | Downtime | Gen | Matthew |
CIP 2.2.0 upgrade (STC) | TBD | TBD | At-risk | All | Matthew |
Advanced Planning
Tasks
- Test and re-apply CIP upgrade (Jens, Matthew)
- Test and certify 2.1.12-4 and 2.1.11-9 (Matthew, Chris)
- Stress testing of Transfer Manager (TM) (Shaun, All) DONE
- Ganglia monitoring for TM (Rob, Chris) IN PROGRESS
- Re-instantiate certification on HyperV VMs using Quattor+Puppet (Rob)
- Stress testing of CV11 generation disk servers on preprod (Rob, Matthew) DONE
- Selection of disk-only prototype solution (Shaun, Rob, Brian, James)
- Switch to Tape Gateway on repack and test (Tim, Matthew) DONE
Interventions
- Upgrade repack to 2.1.12-4 (Apr)
- Switch from LSF to TM after 2.1.11-8 upgrade. Will need to better stress-test TM on preprod with more disk servers. (Apr)
- Switch to Tape Gateway (TG) once it has been tested on repack (May)
- Upgrade Castor Facilities and Tier1 instances to 2.1.11-9 (Jun)
- Upgrade Oracle to 11g (Jun)
- Upgrade to 2.1.12 on Tier1 instances once we are happy with TM and TG in performance (Jul)
Staffing
- Castor on Call person: Matthew
- Staff absence/out of the office:
- Bank Holiday Monday