Production Team Report 2009-09-21

From GridPP Wiki
Jump to: navigation, search

RAL Tier1 Production Team Report for 21st September 2009.

AoD This Week

Mon/Tue: John Wed: Jonathan Wed-Fri: John

Last Week

  • Gareth: AoD (2 days), Followed up with incident tracking Footprints database,
  • John: Investigated (amongst others) nagios, LSF.
  • Tiju: AoD (2 days), Documented elogger

This Week

  • Gareth: More on incident database, disk server intervention meeting.
  • John: AoD (4 days), resolving farm root mail overload.
  • Tiju: More work on dashboard and e-mail sending system. Test disk server deployment.

Changes to Operating procedures

  • None

Declared Outages in GOC DB

  • 21-Sep: Atlas instance SRM upgrade (2.8-0)
  • 21-Sep: lb02 rebuild to enable hot swapping of disks.
  • 21-Sep: Patching of Oracle database for 'resilience bug'.
  • 22-Sep: Outage declared during R89 UPS tests (Castor quiet).
  • 22-Sep: WMS02 rebuild (including draining). Until 30th.