Production Team Report 2010-11-08

From GridPP Wiki
Jump to: navigation, search

RAL Tier1 Production Team Report for 8th November 2010.

AoD This Week

Mon - Tue: John Wed: Catalin Thu - Fri: John

Last week

  • Gareth: AoD(1day), Discussions on wrapping up EMC UPS power issues & monitoring tape issues. Drafted Post Mortem on LHCb issues, Web pages for HEP SYSMAN meeting.
  • John: Looking at Nagios test for read only file systems.
  • Tiju: AoD(4days) Investigate Icinga and other Nagios possibilities.

Changes to Operating procedures

  • None

Declared Outages in GOC DB

  • Wednesday 10th Nov. 08:00-16:00 Outage on srm-lhcb. Upgrading disk servers to 64-bit OS.
  • Tue - Thu 16-18 Nov. CMS Castor 2.1.9 upgrade.

Advanced Warning

  • Tuesday 9th Nov. Mon -> APEL switch over.
  • Remaining Castor: Current scheduling (T.B.C) on following dates:
    • Upgrade ATLAS - Monday - Wednesday 6 - 8 December.
  • Weekend 11/12 Dec: Power outage in Atlas building.
  • Monday 13th December - UPS test.
  • Either Tuesday 23 or 30 Nov: Intervention on tape robot to fix power for cooling problem.

Other Changes

  • Fabric:
    • Tape Drive Microcode Update.
    • Double the network link to the tape robot stack (stack 12), postponed from the last TS. (Requires Castor stop).
    • Swap out the older of the pair of SAN switches in the Tier1 Oracle databases for its new replacement. (Requires FTS, LFC, 3D stop).
  • Database:
    • Re-visit non-Castor database multipathing
  • Grid Services:
    • None
  • Castor:
    • SRM updates for LHCb
  • Networks:
    • None