Production Team Report 2010-11-01
From GridPP Wiki
Contents
RAL Tier1 Production Team Report for 1st November 2010.
AoD This Week
Mon - Wed: Tiju Thu: Gareth Fri: Tiju
Last week
- Gareth: AoD(1days), A/L (half day).
- John: AoD(4days)
- Tiju: Investigate icinga, Added apollo, Other nagios work
Changes to Operating procedures
- Primary On-Call Rota now on Google.
- Atlas running User Jobs (making use of CVMFS)
Declared Outages in GOC DB
- Tues 2nd Nov: "Warning" on all SRM end points. Tape System Unavailable. Work on tape robot to resolve problem with power supply cooling.
- Tues 2nd Nov: "Warning" on MyProxy for migration to Quattorized service.
- Wednesday 3rd Nov. "Warning" on Site-BDII for rolling update to glite 3.2 and SL5.
Advanced Warning
- Remaining Castor: Current scheduling (T.B.C) on following dates:
- Upgrade CMS - Tuesday - Thursday 16-18 November.
- Upgrade ATLAS - Monday - Wednesday 6 - 8 December.
- Monday 13th December - UPS test.
Other Changes
- Fabric:
- Upgrade of Disk servers to 64-bit OS (to resolve checksum problem).
- Tape Drive Microcode Update.
- Double the network link to the tape robot stack (stack 12), postponed from the last TS. (Requires Castor stop).
- Swap out the older of the pair of SAN switches in the Tier1 Oracle databases for its new replacement. (Requires FTS, LFC, 3D stop).
- Update firmware in RAID controller cards for a batch of disk servers.
- Database:
- Re-visit non-Castor database multipathing
- Grid Services:
- Replacing the MON box with glite-APEL
- Castor:
- Possible SRM update
- Networks:
- None
- Atlas
- Enable user jobs.