Production Team Report 2010-08-16
From GridPP Wiki
Contents
RAL Tier1 Production Team Report for 16th August 2010.
AoD This Week
Mon & Tues: John Wed: Tiju Thu: Gareth Thu: Tiju
Last week
- Gareth: AoD (1 day), EMCvUPS issues, Job Plan into SSC, Looking at batch scheduling.
- John: A/L (3 days), Updating Nagios tests for Castor 2.1.9.
- Tiju: AoD (1 day), Modem script tests, understanding iSCSI/Hyper-V/Windows 2008
Changes to Operating procedures
- None (but note temporary arrangements for Networks callout).
Declared Outages in GOC DB
- 12 - 19 August: WMS03 update to version 3.1.29-0. From the start of the Outage until Tuesday 17th August (10:00 UTC) WMS03 will be in draining mode when existing jobs will be allowed to finish and output retrieved.
- 17 August: At Risk on Top-BDII for glite update.
Advanced Warning
- Day not specified: Test of seal under 1st floor kitchen.
- Wed/Thu 1/2 September: Possible transformer checks in R89 (during LHC technical stop).
- Weekend 2/3 October: Power outage in atlas building.
- Update WNs (glite update)
- Replace RAL site-level BDII servers
- Update RAID controller firmware on all Streamline 2008 disk servers
- Replace SURE with new callout script.
- Migrate Nagios checks for batch workers to new slave server
- Update imapd X509 certificate on PAT
Other Changes
- Fabric:
- Double the network link to the tape robot stack (stack 12), postponed from the last TS. (Requires Castor stop).
- Swap out the older of the pair of SAN switches in the Tier1 Oracle databases for its new replacement. (Requires FTS, LFC, 3D stop).
- New kernels and glibc updates on non-castor Oracle RAC nodes. (Done for LUGH).
- Updates to amanda backup - unblocking possible other updates.
- Database:
- Re-visit non-Castor database mulitpathing
- Grid Services:
- None apart from those listed above.
- Castor:
- Possible SRM update
- Castor 2.1.9 upgrade
- Networks:
- Commissioning OPN link