Production Team Report 2010-11-15

From GridPP Wiki
Jump to: navigation, search

RAL Tier1 Production Team Report for 15th November 2010.

AoD This Week

Mon - Wed: Tiju Thu: Gareth Fri: Tiju

Last week

  • Gareth: Start post Mortem for the GDSS298 failure,
  • John: AoD(4days)
  • Tiju: Continue looking at Nagios possibilities, Nagger cleanups.

Changes to Operating procedures

  • None

Declared Outages in GOC DB

  • Tue - Thu 16-18 Nov. CMS Castor 2.1.9 upgrade.
  • Tue 23rd Nov. Tape System Unavailable. Work on tape robot to resolve problem with power supply cooling.

Advanced Warning

  • TODAY add new SRMs for Atlas
  • Mon-Thu 22-25 Nov. Upgrade CE08 to a CREAM CE.
  • Remaining Castor: Current scheduling (T.B.C) on following dates:
    • Upgrade ATLAS - Monday - Wednesday 6 - 8 December.
  • Weekend 11/12 Dec: Power outage in Atlas building.
  • Monday 13th December - UPS test.

Other Changes

  • Fabric:
    • Tape Drive Microcode Update.
    • Double the network link to the tape robot stack (stack 12), postponed from the last TS. (Requires Castor stop).
    • Swap out the older of the pair of SAN switches in the Tier1 Oracle databases for its new replacement. (Requires FTS, LFC, 3D stop).
  • Database:
    • Re-visit non-Castor database multipathing
    • Increase shared memory for OGMA, LUGH & SOMNUS
  • Grid Services:
    • Changes to increase resilience of the BDII service
    • Quattorised gLite 3.2 LB nodes being put into production
    • Quattorisation of FTS Web Service hosts
  • Castor:
    • Change ATLAS castor permissions to prevent users deleting data
  • Networks:
    • None