RAL Tier1 weekly operations castor 02/01/2012
From GridPP Wiki
Contents
Operations News
- none
Operations Problems
- atlasStager var partition close to the limit on 26th Dec
- readonly disk server gdss307(cmsWanIn) on 27th Dec
- large number of errors in aliceDisk on 28th Dec, investigation showed all disk servers were busy writing files and some of them were timing out
- efficiency around 40% for aliceDisk on 29th Dec due to similar problem as on 28th Dec. Number of xrootd job slots were changed from 50 to 100 to accommodate all requests which were timing out
- atlasStager var partition close to the limit and puppetmaster02 was unresponsive which was restarted on 31st Dec
Blocking Issues
- none
Planned, Scheduled and Cancelled Interventions
Entries in/planned to go to GOCDB
Description | Start | End | Type | Affected VO(s) | Lead by |
---|---|---|---|---|---|
Stage 1 of move to new CASTOR DB hardware | 05/01/2012 08:30 | 05/01/2012 16:00 | Downtime | All | Rich |
SRM 2.11 upgrade, inc. move to new hardware+SL5+Quattor (STC) | 16/01/2012 08:00 | 18/01/2012 16:00 | Downtime | All | Shaun |
CIP 2.2.0 upgrade (STC) | 26/01/2012 10:00 | 26/01/2012 12:00 | At-risk | All | Matthew |
Stage 2 of CASTOR DB move (STC) | 07/02/2012 08:00 | 07/02/2012 16:00 | Downtime | All | Rich |
CASTOR 2.11-8 upgrade, inc. move to new hardware+SL5+Quattor (STC) | 13/02/2012 08:00 | 24/02/2012 16:00 | Downtime | All | Matthew |
Advanced Planning
- Move Tier1 instances to new Database infrastructure which with a Dataguard backup instance in R26
Staffing
- Castor on Call person: Shaun
- Staff absence/out of the office:
- All (Mon)
- Matthew A/L (all week)