RAL Tier1 weekly operations castor 05/10/2009
From GridPP Wiki
Contents
Summary of Previous Week
- Working on a problem with kernel clashing with FC card which prevents us to upgrade tape servers to the latest kernel (Chris)
- Working with vendor isolating cause of database raid controller (Cheney)
- Kernel patching of robot controllers (Cheney)
- Setting up CIP hosting machine (Cheney)
- Setting up repack server (Richard, Fabric team, DB Team, Chris)
- SRM 2.8.1 upgrade on ATLAS (Shaun, DB Team)
- Debugging ATLAS transfer problems (Shaun, Matt)
- Fixed LHCb bottleneck on lhcbMdst by increasing job slots (Shaun)
- Chasing up strategic objectives (Matt)
- Establishing CASTOR change control policy (Matt)
Developments for this week
- Carry on working on kernel problem for tape servers (Chris)
- Setup 2.1.8 on repack server with Puppet (Chris)
- Working on puppet manifest for polymorphic central servers (Chris)
- 2.8-1 deployment on Gen,LHCb,CMS (Shaun)
- Preparing for CASTOR F2F meeting (Matt, All)
- Add extra raid controller to LHCb D1T0 disk servers (Matt, Fabric team, Production team)
Ongoing
- CastorMon monitoring graphs for Gen instance (Brian)
- Black and White list tests (Chris)
- Disaster recovery document (Matt)
Operations Issues
- ATLAS SRM get failures affecting some jobs - being investigated on certification
- LHCb ran out of slots on lhcbMdst. Increased job slots.
Blocking issues
- Problems with ganglia check on GEN instance delaying work on monitoring (in hand)
Planned, Scheduled and Cancelled Down Times
Entries in/planned to go to GOCDB
Description | Start | End | Type | Affected VO(s) |
---|---|---|---|---|
SRM 2.8-1 upgrade | 5/10/09 1000 | 5/10/09 1030 | At risk | Gen,LHCb,CMS |
Replace faulty ORACLE voting disk | 6/10/09 1000 | 6/10/09 1200 | Downtime | ATLAS, LHCb |
Changes to Production Milestones
Advanced Planning
- Add extra raid controller to LHCb D1T0 servers
- Black and White lists? (delayed until it is required on a 'per-instance' basis)
- Improve resiliency to central services (This year)
Staffing
- Brian A/L
- Richard away
- Castor on Call person: Shaun