Production Team Report 2010-11-29

From GridPP Wiki
Jump to: navigation, search

RAL Tier1 Production Team Report for 29th November 2010.

AoD This Week

Mon - Wed: Tiju Thu: Gareth Fri: Tiju

Last week

  • Gareth: HepSYSMAN (Mon), Linux admin course (Tue-Fri)
  • John: AoD(4days); Prepared Nagios test for read-only file systems.
  • Tiju: Summarise Atlas operational disk interventions, Nagios stuff, Security Task.

Changes to Operating procedures

  • Call-outs being added for FSPROBE errors and read-only file systems.

Declared Outages in GOC DB

  • Mon 6th - Wed 8th Dec upgrade of Atlas Castor instance.

Advanced Warning

  • Overnight Mon-Tue (23:00-05:00) scheduled maintenance on backup OPN link to CERN.
  • From 23:00 on Thursday 2nd to 05:00 on Sunday 5th: scheduled maintenance on main OPN link to CERN.
  • Wed 1st Dec. Microcode update on half of tape drives (and a fortnight later for other half).
  • Weekend 11/12 Dec: Power outage in Atlas building.
  • Monday 13th December - UPS test.

Other Changes

  • Fabric:
    • Double the network link to the tape robot stack (stack 12), postponed from the last TS. (Requires Castor stop).
    • Swap out the older of the pair of SAN switches in the Tier1 Oracle databases for its new replacement. (Requires FTS, LFC, 3D stop).
  • Database:
    • Re-visit non-Castor database multipathing
    • Increase shared memory for OGMA, LUGH & SOMNUS
  • Grid Services:
    • Changes to increase resilience of the BDII service
  • Castor:
    • Change ATLAS castor permissions to prevent users deleting data
  • Networks:
    • None