Production Team Report 2010-10-04

From GridPP Wiki
Jump to: navigation, search

RAL Tier1 Production Team Report for 4th October 2010.

AoD This Week

Mon - Tue: John Wed: Gareth Thu-Fri: John

Last week

  • Gareth: AoD(1 Day), Some planning for Atlas (non-)outage
  • John: Updating Nagios tests during Castor upgrade. Dump of Wiki.
  • Tiju: AoD (4days)

Changes to Operating procedures

  • Note new number to contact Network Team out of hours.
 https://wiki.e-science.cclrc.ac.uk/web1/bin/view/EScienceInternal/CalloutNetworking

Declared Outages in GOC DB

  • CMS VO boxes (lcgvo0428 & lcgvo0599) down for decommissioning.

Advanced Warning

  • Monday 18th October - R89 Transformer Checks.
  • Wednesday 20th October - UPS maintenance.
  • Monday 13th December - UPS test.
  • Some kernel updates will be required.
  • Remaining Castor upgrades almost certainly on following dates:
    • Upgrade Gen (including ALICE) - during the week beginning 25 October
    • Upgrade CMS - during the week beginning 8 November
    • Upgrade ATLAS - during the week beginning 22 November

Other Changes

  • Fabric:
    • Double the network link to the tape robot stack (stack 12), postponed from the last TS. (Requires Castor stop).
    • Swap out the older of the pair of SAN switches in the Tier1 Oracle databases for its new replacement. (Requires FTS, LFC, 3D stop).
    • New kernels and glibc updates on non-castor Oracle RAC nodes. (Done for LUGH).
    • Update firmware in RAID controller cards for a batch of disk servers.
  • Database:
    • Re-visit non-Castor database multipathing
  • Grid Services:
    • New Quattorized front ends for FTS.
    • Rolling update to Top-BDII nodes to fix disk partitioning layout.
  • Castor:
    • Possible SRM update
    • Castor 2.1.9 upgrades
  • Networks:
    • None