Production Team Report 2010-11-15
From GridPP Wiki
Contents
RAL Tier1 Production Team Report for 15th November 2010.
AoD This Week
Mon - Wed: Tiju Thu: Gareth Fri: Tiju
Last week
- Gareth: Start post Mortem for the GDSS298 failure,
- John: AoD(4days)
- Tiju: Continue looking at Nagios possibilities, Nagger cleanups.
Changes to Operating procedures
- None
Declared Outages in GOC DB
- Tue - Thu 16-18 Nov. CMS Castor 2.1.9 upgrade.
- Tue 23rd Nov. Tape System Unavailable. Work on tape robot to resolve problem with power supply cooling.
Advanced Warning
- TODAY add new SRMs for Atlas
- Mon-Thu 22-25 Nov. Upgrade CE08 to a CREAM CE.
- Remaining Castor: Current scheduling (T.B.C) on following dates:
- Upgrade ATLAS - Monday - Wednesday 6 - 8 December.
- Weekend 11/12 Dec: Power outage in Atlas building.
- Monday 13th December - UPS test.
Other Changes
- Fabric:
- Tape Drive Microcode Update.
- Double the network link to the tape robot stack (stack 12), postponed from the last TS. (Requires Castor stop).
- Swap out the older of the pair of SAN switches in the Tier1 Oracle databases for its new replacement. (Requires FTS, LFC, 3D stop).
- Database:
- Re-visit non-Castor database multipathing
- Increase shared memory for OGMA, LUGH & SOMNUS
- Grid Services:
- Changes to increase resilience of the BDII service
- Quattorised gLite 3.2 LB nodes being put into production
- Quattorisation of FTS Web Service hosts
- Castor:
- Change ATLAS castor permissions to prevent users deleting data
- Networks:
- None