Production Team Report 2010-11-29
From GridPP Wiki
Contents
RAL Tier1 Production Team Report for 29th November 2010.
AoD This Week
Mon - Wed: Tiju Thu: Gareth Fri: Tiju
Last week
- Gareth: HepSYSMAN (Mon), Linux admin course (Tue-Fri)
- John: AoD(4days); Prepared Nagios test for read-only file systems.
- Tiju: Summarise Atlas operational disk interventions, Nagios stuff, Security Task.
Changes to Operating procedures
- Call-outs being added for FSPROBE errors and read-only file systems.
Declared Outages in GOC DB
- Mon 6th - Wed 8th Dec upgrade of Atlas Castor instance.
Advanced Warning
- Overnight Mon-Tue (23:00-05:00) scheduled maintenance on backup OPN link to CERN.
- From 23:00 on Thursday 2nd to 05:00 on Sunday 5th: scheduled maintenance on main OPN link to CERN.
- Wed 1st Dec. Microcode update on half of tape drives (and a fortnight later for other half).
- Weekend 11/12 Dec: Power outage in Atlas building.
- Monday 13th December - UPS test.
Other Changes
- Fabric:
- Double the network link to the tape robot stack (stack 12), postponed from the last TS. (Requires Castor stop).
- Swap out the older of the pair of SAN switches in the Tier1 Oracle databases for its new replacement. (Requires FTS, LFC, 3D stop).
- Database:
- Re-visit non-Castor database multipathing
- Increase shared memory for OGMA, LUGH & SOMNUS
- Grid Services:
- Changes to increase resilience of the BDII service
- Castor:
- Change ATLAS castor permissions to prevent users deleting data
- Networks:
- None