Production Team Report 2010-10-25
From GridPP Wiki
Contents
RAL Tier1 Production Team Report for 25th October 2010.
AoD This Week
Mon - Tue: John Wed: Gareth Thu - Fri: John
Last week
- Gareth: AoD(1days), Some follow up with Post Mortem & DM issues.
- John: Preparing for Castor GEN upgrade; re-writing and deploying the castorVoverwatch script.
- Tiju: AoD(4days),
Changes to Operating procedures
- New GOC DB interface. ("At Risk" state replaced by "Warning".)
- Detailed changes when disabling batch worker nodes. (Ensure put in extended Nagios downtime)
Declared Outages in GOC DB
- Mon - Wed 25-27 Oct: Castor GEN upgrade.
- Tues 2nd Nov: "Warning" on MyProxy for migration to Quattorized service.
Advanced Warning
- Remaining Castor upgrades almost certainly on following dates:
- Upgrade CMS - during the week beginning 8 November
- Upgrade ATLAS - during the week beginning 22 November
- Monday 13th December - UPS test.
Other Changes
- Fabric:
- Upgrade of Disk servers to 64-bit OS (to resolve checksum problem).
- Tape Drive Microcode Update.
- Double the network link to the tape robot stack (stack 12), postponed from the last TS. (Requires Castor stop).
- Swap out the older of the pair of SAN switches in the Tier1 Oracle databases for its new replacement. (Requires FTS, LFC, 3D stop).
- Update firmware in RAID controller cards for a batch of disk servers.
- Database:
- Re-visit non-Castor database multipathing
- Grid Services:
- Replacing the MON box with glite-APEL
- Upgrade site-level BDIIs to glite 3.2 and SL5
- Castor:
- Possible SRM update
- Networks:
- None
- Atlas
- Enable user jobs.