RAL Tier1 weekly operations castor 11/10/2010
From GridPP Wiki
Contents
Work previous week
- Matthew:
- LHCb monitoring & investigating performance problems
- Planning Castor Facilities work
- CoD work
- Shaun:
- Chris:
- Castor Facilities work
- Richard:
- ..
- Brian:
- ..
- Jens:
- ..
Operations Issues
- A number of LHCb CASTOR jobs are failing apparently because of an internal network timeout between disk servers and the stager. LHCb SAM tests seem particularly affected.
Blocking issues
none
Planned, Scheduled and Cancelled Interventions
Entries in/planned to go to GOCDB
Description | Start | End | Type | Affected VO(s) |
---|---|---|---|---|
Update Gen to 2.1.9 (STC) | 25/10/2010 08:00 | 27/10/2010 18:00 | Downtime | Gen |
Update CMS to 2.1.9 (STC) | 08/11/2010 08:00 | 10/11/2010 18:00 | Downtime | CMS |
Update ATLAS to 2.1.9 (STC) | 22/11/2010 08:00 | 24/11/2010 18:00 | Downtime | ATLAS |
Advanced Planning
- Upgrade disk servers to 64bit o/s
- Upgrade to 2.1.9-8 after all instances are upgraded to 2.1.9-6
- CASTOR for Facilities instance in production by end of 2010
Staffing
- Castor on Call person: Chris
- Staff absences:
- ..