RAL Tier1 weekly operations castor 26/12/2011
From GridPP Wiki
Contents
Operations News
- All 2.1.11 headnode components are now setup (inc. Transfer Manager) and are being tested
- All new SRM machines are installed and are awaiting testing.
Operations Problems
- On the night/morning of 22 Dec, problems with the SAR caused all Ops SAM tests to fail from 01:30-09:00
- During the early morning of 23 Dec, performance of LHCb SRM DB degraded. This was picked up and DB On-Call regenerated stats, which improved matters.
- atlasStager var partition close to the limit on 24th Dec
- 25% failures on 25th Dec in lhcbDst, investigation showed 6 hot files which were tried to be accessed
Blocking Issues
- none
Planned, Scheduled and Cancelled Interventions
Entries in/planned to go to GOCDB
Description | Start | End | Type | Affected VO(s) | Lead by |
---|---|---|---|---|---|
Stage 1 of move to new CASTOR DB hardware | 05/01/2012 08:30 | 05/01/2012 16:00 | Downtime | All | Rich |
SRM 2.11 upgrade, inc. move to new hardware+SL5+Quattor (STC) | 16/01/2012 08:00 | 18/01/2012 16:00 | Downtime | All | Shaun |
CIP 2.2.0 upgrade (STC) | 26/01/2012 10:00 | 26/01/2012 12:00 | At-risk | All | Matthew |
Stage 2 of CASTOR DB move (STC) | 07/02/2012 08:00 | 07/02/2012 16:00 | Downtime | All | Rich |
CASTOR 2.11-8 upgrade, inc. move to new hardware+SL5+Quattor (STC) | 13/02/2012 08:00 | 24/02/2012 16:00 | Downtime | All | Matthew |
Advanced Planning
- Move Tier1 instances to new Database infrastructure which with a Dataguard backup instance in R26
Staffing
- Castor on Call person: Chris
- Staff absence/out of the office:
- All (Xmas)