RAL Tier1 weekly operations castor 04/06/2012
From GridPP Wiki
Contents
Operations News
- Switched from LSF to TM for Gen and CMS
- Repack stager database moved to a host running 11g, so we can now upgrade it to 2.1.12
- DLF database was re-initialized this week to try to improve its performance
Operations Problems
- (Wed) Inaccessible files reported by ALICE were found out to be timeouts within XRD manager. The timeout threshold was raised from 30s to 60s which improved things.
- (Thu) CMS migrations stopped due to interference between mighunters. fixed the problem on Friday and the queue went down successfully.
- (Fri) LHCb SRMs became unresponsive overnight, possibly due to a memory leak with logprocessors repeatedly trying to contact the unavailable database during its upgrade
Blocking Issues
none
Planned, Scheduled and Cancelled Interventions
Entries in/planned to go to GOCDB
Description | Start | End | Type | Affected VO(s) | Lead by |
---|---|---|---|---|---|
CIP 2.2.0 upgrade (STC) | 06/06/12 09:00 | 06/06/12 10:00 | (Internal) | All | Matthew |
Switch from LSF to Transfer Manager | 07/06/12 09:00 | 07/06/12 11:00 | Downtime | ATLAS | Matthew |
2.1.11-9 upgrade (STC) | 13/06/12 09:00 | 13/06/12 14:00 | Downtime | All | Matthew |
ORACLE 11g upgrade (STC) | 27/06/12 09:00 | 27/06/12 17:00 | Downtime | All | Rich |
Advanced Planning
Tasks
- Test and re-apply CIP upgrade (Jens, Matthew)
- Test and certify 2.1.12-4 and 2.1.11-9 (Matthew, Chris)
- Stress testing of Transfer Manager (TM) (Shaun, All) DONE
- Ganglia monitoring for TM (Rob, Chris) DONE
- Re-instantiate certification on HyperV VMs using Quattor+Puppet (Rob)
- Stress testing of CV11 generation disk servers on preprod (Rob, Matthew) DONE
- Selection of disk-only prototype solution (Shaun, Rob, Brian, James)
Interventions
- Upgrade repack to 2.1.12-4 (Jun)
- Upgrade Castor Facilities and Tier1 instances to 2.1.11-9 (Jun)
- Upgrade to 2.1.12 on Tier1 instances once we are happy with TM and TG in performance (Jul)
Staffing
- Castor on Call person: Shaun
- Staff absence/out of the office:
- (Mon/Tue) Public holiday