Production Team Report 2010-09-06
From GridPP Wiki
Contents
RAL Tier1 Production Team Report for 16th August 2010.
AoD This Week
Mon: Tiju Tues: James T Wed: Gareth Thu & Fri: John
Last week
- Gareth: AoD(1 Day), A/L (Fri)
- John: Cern School
- Tiju: AoD (2 days), Virtualisation
Leave
- Gareth: Monday
Changes to Operating procedures
- New callout system in production.
Declared Outages in GOC DB
- Sept 7(08:30-17:00) : At Risk while switching over to a quattorised pair of site-level BDIIs.
- Sept 1-9 : lcgwms01 - Maintenance and update (glite-WMS 3.1.29). Includes time for drain ahead of intervention.
- Sept 9-16 : lcgwms02 - Maintenance and update (glite-WMS 3.1.29). Includes time for drain ahead of intervention.
Advanced Warning
- September 7: Test of seal under 1st floor kitchen.(No access to kitchen)
- Wednesday 8 Sept to Wed 15th Sept: Migrate Nagios checks for batch workers to new slave server
- Weekend 2/3 October: Power outage in atlas building.
- Update WNs (glite update)
- Replace RAL site-level BDII servers
- Update RAID controller firmware on all Streamline 2008 disk servers
Other Changes
- Fabric:
- Double the network link to the tape robot stack (stack 12), postponed from the last TS. (Requires Castor stop).
- Swap out the older of the pair of SAN switches in the Tier1 Oracle databases for its new replacement. (Requires FTS, LFC, 3D stop).
- New kernels and glibc updates on non-castor Oracle RAC nodes. (Done for LUGH).
- Updates to amanda backup - unblocking possible other updates.
- Database:
- Re-visit non-Castor database mulitpathing
- Grid Services:
- None apart from those listed above.
- Castor:
- Possible SRM update
- Castor 2.1.9 upgrade
- Networks: