Production Team Report 2010-09-13

From GridPP Wiki
Jump to: navigation, search

RAL Tier1 Production Team Report for 13th September 2010.

AoD This Week

Mon & Tues Tiju Wed: Gareth Thu & Fri: Tiju

Last week

  • Gareth: AoD(1+ Day), A/L (Mon), Common Technologies, chasing some operational issues (e.g. NDGF link).
  • John: AoD(1+ Day), Work on castor Nagios tests, Wiki copy,
  • Tiju: AoD(1 days), A/L Half-day, Nagios reorganisation, Security Task, Preparing sv-08-17

Changes to Operating procedures

  • None.

Declared Outages in GOC DB

  • Sept 9-16 : lcgwms02 - Maintenance and update (glite-WMS 3.1.29). Includes time for drain ahead of intervention.
  • Tuesday 14 September - Site At Risk for Firewall Reboot, and outage on FTS for drain of transfers.
  • Monday 13 September 10:00-12:00 At Risk for switch to local Nameserver on LHCb instance.
  • Tuesday 14 September 10:00-12:00 At Risk for switch to local Nameserver on GEN instance.
  • Wednesday 15 September 10:00-12:00 At Risk for switch to local Nameserver on CMS instance.
  • Thursday 16 September 10:00-12:00 At Risk for switch to local Nameserver on Atlas instance.

Advanced Warning

  • Wednesday 8 Sept to Wed 15th Sept: Migrate Nagios checks for batch workers to new slave server
  • Tuesday 14th Sept. Bring now Quattorized front ends for LFC Atlas into use.
  • Tuesday `21st September. Firmware update on site firewall.
  • Weekend 2/3 October: Power outage in atlas building.
  • Update WNs (glite update) - ongoing.
  • Wednesday 20th October - UPS maintenance.
  • Monday 13th December - UPS test.

Other Changes

  • Fabric:
    • Double the network link to the tape robot stack (stack 12), postponed from the last TS. (Requires Castor stop).
    • Swap out the older of the pair of SAN switches in the Tier1 Oracle databases for its new replacement. (Requires FTS, LFC, 3D stop).
    • New kernels and glibc updates on non-castor Oracle RAC nodes. (Done for LUGH).
    • Update firmware in RAID controller cards for a batch of disk servers.
  • Database:
    • Re-visit non-Castor database multipathing
  • Grid Services:
    • New Quattorized front ends for LFC.
    • New Quattorized front ends for FTS.
  • Castor:
    • Possible SRM update
    • Castor 2.1.9 upgrade
  • Networks: